2016-07-07 60 views
1

我正在處理OFX(銀行交易)文件。我的銀行不使用<NAME>標籤指定收款人,但此信息是<MEMO>標籤的子字符串。sed處理OFX,從<MEMO>提取收款人並在<NAME>上打印

所以,我的文件是這樣的:

...ofx headers and other stuff 
...line below is a transaction 
<STMTTRN> 
    <TRNTYPE>OTHER</TRNTYPE> 
    <DTPOSTED>20160609120000</DTPOSTED> 
    <TRNAMT>-4.00</TRNAMT> 
    <FITID>2016060914000</FITID> 
    <CHECKNUM>000000700132</CHECKNUM> 
    <REFNUM>700.132</REFNUM> 
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO> 
</STMTTRN> 
...continues other transactions and end of file 

我想每一個<MEMO>標籤匹配,提取收款人姓名(Walmart 2th street在這個例子中)和寫有<NAME>新線。我的輸出是這樣的:

...ofx headers and other stuff 
...line below is a transaction 
<STMTTRN> 
    <TRNTYPE>OTHER</TRNTYPE> 
    <DTPOSTED>20160609120000</DTPOSTED> 
    <TRNAMT>-4.00</TRNAMT> 
    <FITID>2016060914000</FITID> 
    <CHECKNUM>000000700132</CHECKNUM> 
    <REFNUM>700.132</REFNUM> 
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO> 
    <NAME>Walmart 2th street</NAME> 
</STMTTRN> 
...continues other transactions and end of file 

作爲awk的另一個工具可能是一個解決方案。

回答

2

隨着GNU sed的:

sed -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n <NAME>\1<\/NAME>/' file 

輸出:

<STMTTRN> 
    <TRNTYPE>OTHER</TRNTYPE> 
    <DTPOSTED>20160609120000</DTPOSTED> 
    <TRNAMT>-4.00</TRNAMT> 
    <FITID>2016060914000</FITID> 
    <CHECKNUM>000000700132</CHECKNUM> 
    <REFNUM>700.132</REFNUM> 
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO> 
    <NAME>Walmart 2th street</NAME> 
</STMTTRN> 

如果你想編輯文件 「就地」 用sed的選項-i

+0

使用-i ,我有一個拉丁字符正則表達式不匹配的行: CartãodeCrédito - 09/06 18:37 Walmart 2th street Filipe

+0

我放棄了非ascii字符。我在下面寫了一個答案來格式化代碼。 – Filipe

0

補充@Cyrus答案對付沒有ASCII字符:

我放棄了非ASCII字符,現在它的工作:

iconv -f "windows-1252" -t "UTF-8" file-ansi.ofx -o file-utf8.ofx 
rm file-ansi.ofx 
sed 'y/áÁàÀãÃâÂéÉêÊíÍóÓõÕôÔúÚüÜçÇ/aAaAaAaAeEeEiIoOoOoOuUuUcC/' -i file-utf8.ofx 
sed -i -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n     <NAME>\1<\/NAME>/' file-utf8.ofx 

我的輸出:

<MEMO>Cartao de Credito - 09/06 18:37 Walmart 2th</MEMO> 
<NAME>Walmart 2th street</NAME> 
+0

GNU和BSD'sed'應該能夠正確處理UTF-8輸入(假設你的locale是基於UTF-8的),所以我認爲你不需要'rm'和'sed'y/..命令。 'sed -E'/ [[:alpha:]] /一個字母/'<<<'ã''表示非ASCII字母被識別爲這樣。 – mklement0