bash腳本翻譯XML

您好我有幾十個XML文件與
我需要這樣的：bash腳本翻譯XML

<p begin="00:06:28;12" end="00:00:02;26">

翻譯成這樣：

<p begin="628.12" end="631.08">

我知道我需要一個簡單的awk或sed的要做到這一點，但要成爲新人;有人可以幫助

來源

2010-01-14 Ankur Chauhan

舊值和新值之間的關係是什麼？ – 2010-01-14 20:56:55

看起來像一個總和 – 2010-01-14 21:42:55

最終值不應該是630.38？ – Jamie 2010-01-14 22:29:51

啊ghostdog74打我吧。不過，我也處理ms。

awk ' 
    function timeToMin(str) { 
     time_re = "([0-9][0-9]):([0-9][0-9]):([0-9][0-9]);([0-9][0-9])" 

     # Grab all the times in seconds. 
     s_to_s = gensub(time_re, "\\3", "g", str); 
     m_to_s = (gensub(time_re, "\\2", "g", str)+0)*60; 
     h_to_s = (gensub(time_re, "\\1", "g", str)+0)*60*60; 
     ms  = gensub(time_re, "\\4", "g", str); 

     # Create float. 
     time_str = (h_to_s+m_to_s+s_to_s)"."ms; 

     # Converts from num to str. 
     return time_str+0; 
    } 
    function addMins(aS, bS) { 
     # Split by decimal point 
     split(aS, aP, "."); 
     split(bS, bP, "."); 

     # Add the seconds and ms. 
     min = aP[1]+bP[1]; 
     ms = aP[2]+bP[2]; 
     if (ms > 59) { 
      ms = ms-60; 
      mins++; 
     } 

     # Return addition. 
     return (min"."ms)+0; 
    } 
    { 
     re = "<p begin=\"(.+)\" end=\"(.+)\">"; 
     if ($0 ~ re) { 
      # Pull out the data. 
      strip_re = ".*"re".*"; 
      begin_str = gensub(strip_re, "\\1", "g"); 
      end_str = gensub(strip_re, "\\2", "g"); 

      # Convert. 
      begin = timeToMin(begin_str); 
      end = timeToMin(end_str); 

      elapsed_end=addMins(begin, end); 

      sub(re,"<p begin=\""begin"\" end=\""elapsed_end"\">"); 
     } 

     print $0; 
    } 
' file

來源

2010-01-15 00:43:08 Jamie

如果反對錶示反對使用awk，通常我會同意，但如果它是一個腳本，那麼我不會看到任何理由。 – Jamie 2010-01-15 01:05:14

如果輸入是

輸出是

這個解決方案似乎不工作這是不正確的 – 2010-01-15 02:59:58

正則表達式應該（[0-9] [0-9]）？：？（[0-9] [0-9]）？：？（[0-9] [0-9]）？ [0-9] [0-9]）？」並返回time_str + 0;應該返回time_str;否則32.20 + 5.01將被視爲32.2 + 5.01 – 2010-01-15 03:38:56

XSL樣式表會更可靠。您可以從shell腳本運行一個。

來源

2010-01-14 20:57:00 bmargulies

我建議使用Perl（或其他腳本語言）和XML解析模塊（有關Perl和XML的更多詳細信息，請參閱here）。

這樣你可以可靠解析XML並提取/操作編程形式的值。可靠地注意這個詞。您的XML可能會使用簡單的sed/awk不會尊重的字符編碼（不可否認，在這種情況下，誠然，但值得注意的是這些問題）。

來源

2010-01-14 21:40:34

這裏有一個開始。我不知道你是怎麼想添加的十進制值，讓你自己做

awk '/.*<p[ ]+begin=.*[ ]+end=.*/{ 
    o=$0 
    gsub(/.*begin=\042|\042|>/,"") 
    m=split($0,s,"end=") 
    gsub(/[:;]/," ",s[1]) 
    gsub(/[:;]/," ",s[2]) 
    b=split(s[1],begin," ") 
    e=split(s[2],end," ") 
    # do date maths here 
    if (b>3){ 
     tbegin=(begin[1]*3600) + (begin[2]*60) + begin[3] ##"."begin[4] 
    }else{ 
     tbegin=(begin[1]*60) + begin[3] ##"."begin[4] 
    } 
    # add the decimal yourself 
    if(e>3) { 
     tend = (end[1]*3600) +(end[2]*60)+end[3]+ tbegin ##"."end[4] 
    }else{ 
     tend = (end[1]*60)+end[3]+ tbegin ##"."end[4] 
    } 
    string=gensub("(.*begin=\042).*(end=\042)(.*)\042>", "\\1" tbegin "\042\\2" tend"\042>","g",o) 
    $0=string 
} 
{print} 
' file

如

$ cat file 
<p begin="00:06:28;12" end="00:00:02;26"> 
<p begin="00:08:45;12" end="00:00:23;26"> 
<p begin="08:45;12" end="00:2;26"> 

$ ./shell.sh 
<p begin="388" end="390"> 
<p begin="525" end="548"> 
<p begin="492" end="518">

如果你正在做的比這更加複雜的任務，使用的解析器。

來源

2010-01-15 00:34:39 ghostdog74

bash腳本翻譯XML

回答

相關問題