2013-03-08 122 views
0

我有一個.xml文件,其中我必須搜索「<reviseddate>」標記。它可以在文件中出現多次。如果是這樣我不得不更換「<reviseddate>」標記爲「<reviseddate1>」我需要爲這個用於使用遞增值替換字符串的shell腳本

文本的樣本是一個shell腳本如下:

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised    
<reviseddate> February 4, 2006 </reviseddate>, <reviseddate> August 14, 2006 </reviseddate>, 
and <reviseddate> October 7, 2006 </reviseddate>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California 
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para> 

輸出應該如下

Manuscript received <receiveddate> June 7, 2005 <receiveddate>; revised    
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,   
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California 
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para> 

我已經試過:

for i in $c do 
    sed -e "s/<reviseddate>/<reviseddate$i>/g" $path/$input_file > $path/input_new.xml 
    cp $path/input_new.xml $path/$input_file 
    rm -f input_new.xml 
done 
+0

請解碼您的問題。 – Anubhab 2013-03-08 07:47:18

+0

for i in $ c do sed -e「s///g」$ path/$ input_file> $ path /input_new.xml cp $ path /input_new.xml $ path/$ input_file rm -f input_new.xml done – Mallik 2013-03-08 08:19:01

+0

使用XML解析器;他們可用於多種語言。 – chepner 2013-03-08 16:00:31

回答

0

我會使用一個Perl腳本樣T他做的工作:

#!/usr/bin/env perl 
use strict; 
use warnings; 

my $i = 1; 
while (<>) 
{ 
    while (m%<reviseddate>([^<]+)</reviseddate>%) 
    { 
     s%<reviseddate>([^<]*)</reviseddate>%<reviseddate$i>$1</reviseddate$i>%; 
     $i++; 
    } 
    print; 
} 

對於每一行,每一個門牌<reviseddate>標籤,請用適當的編號標籤的標籤。

輸出示例:

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised    
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>, 
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California 
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para> 

您可以調整該處理的替代方案,例如在一行的開始標記和結束標記上的下一個。在你需要之前,沒有必要大驚小怪。使用正則表達式是一門藝術。您需要平衡緊迫需求與所有可能情景的彈性。


因爲Perl顯然不是「外殼」(但sed是),你可以安排處理文件往往不足以發現所有的條目,並改變它們。

tmp=$(mktemp ./revise.XXXXXXXXXXXX) 
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15 

i=1 
while grep -s '<reviseddate>' filename 
do 
    sed "1,/<reviseddate>/ s%<reviseddate>\([^<]*\)</reviseddate>%<reviseddate$i>\1</reviseddate$i>%" filename > $tmp 
    mv $tmp filename 
    i=$(($i+1)) 
done 

rm -f $tmp # Should be a no-op 
trap 0 

這反覆更新文件。 1,/<reviseddata>部分確保只有第一個<reviseddate>標籤更新(s%%%命令中沒有g,這很重要)。陷阱代碼確保臨時文件不會被留下。

這對您的示例數據起作用,給出相同的輸出。對於小文件,這很好。如果你正在管理多千兆字節的文件,Perl會更好,因爲它只處理一次文件。

+0

謝謝你,但它如何在shell.Any幫助這個.. – Mallik 2013-03-08 08:55:48

相關問題