2017-03-21 41 views
0

我需要幫助將此xml文件格式化爲以逗號分隔的形式導入到表中。我玩過sed和awk,但這是一場艱苦的鬥爭。使用sed或awk格式化爲逗號分隔的XML

例子:

<requestID>224</requestID>, 
    <ErrorMessage>The following is required: PersonName </ErrorMessage>, 
    <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 
<requestID>615</requestID>, 
    <ErrorMessage>The following is required: PersonName </ErrorMessage>, 
    <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 

結果:

<requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 
<requestID>615</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 

我已經能夠補充,我想

sed 's/ErrorMessage>$/ErrorMessage>,/; s/requestID>$/requestID>,/' 

逗號,我認爲這將是較好的去除標籤,但它也刪除所有的空間。

tr -d ' \t' <grep.xml > test.xml 

我不知道如何一行移動到前一行的末尾...

所以這部分工作...

awk '{if ($0 ~ /<ErrorMessage>,*/) { printf "%s", $0; getline var; printf "%s\n", var} else {print $0}}' test.xml 


    <requestID>260</requestID>, 
      <ErrorMessage>The following is required: PersonName</ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>260</requestID></TCRMService> 

但現在我有麻煩將錯誤消息移動到RequestID行的末尾......

請注意,在ErrorMessage行中,requestID也位於同一行中。我認爲關鍵是看該模式匹配上

  </requestID>, 
+0

請求ID 615從哪裏來? –

+0

對不起,它假設爲615.每個requestID代表一個唯一的記錄。 – Janie

+0

它仍然在兩條線上都表示對「ID 224」的「請求控制」。 –

回答

0

試試這個 -

awk -v FS="" '{gsub(/^[[:space:]]+/,"",$0);ORS=(NR%3==0?RS:FS)}1' f 
+0

哇。這工作。謝謝。去研究和了解語法的含義。 – Janie

+0

歡迎,您可以從這裏開始您的研究 - https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html –

0

在awk中,非常QND(假定只有空格,無標籤):

$ awk '{gsub(/^ +| +$|, *$/,"");printf "%s%s", ($0~/^ *<requestID>/?ORS:","), $0}END{print ""}' file 

<requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 
<requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 

現在只需要去除導致換行符但我需要趕上公交車(我可以得到一個交通,男子)。

+0

所以我試了這個,我得到的錯誤: awk:正則表達式中的非法初級^ + |?+ $ |,* $ at + $ |,* $ 源代碼行號1 上下文爲 \t {gsub(/^+ |?+ $ |,>>> * $ /,「」)<<< – Janie

+0

是,'?'作爲正則表達式中的第一個字符是不明確的,所以有些awks會告訴你,而另一些人可能會認爲你的意思是字面意思。我沒有讀過Q這麼說,但是無論它是什麼,只用'?'開始一個正則表達式段是錯誤的。 –

+0

這是一個錯字。無論如何,在這種情況下沒有任何意義(修剪:'gsub(/ ... |?+ $ | ... /)')。 –

0

所以這部分工作...

awk '{if ($0 ~ /<ErrorMessage>,*/) { printf "%s", $0; getline var; printf "%s\n", var} else {print $0}}' test.xml 


    <requestID>260</requestID>, 
      <ErrorMessage>The following is required: PersonName</ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>260</requestID></TCRMService> 

但現在我有移動的ErrorMessage高達請求ID行的末尾麻煩....

請不,在的ErrorMessage線, requestID也在同一行。

0

爲什麼不Perl的片段?隨着波紋管新線被移除,超過兩個的空間被移除。由於您在主要問題中建議的輸入文件已經有相應的逗號,因此不會添加逗號。

$ cat file3 |nl 
    1 <requestID>224</requestID>, 
    2  <ErrorMessage>The following is required: PersonName </ErrorMessage>, 
    3  <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 
    4 <requestID>615</requestID>, 
    5  <ErrorMessage>The following is required: PersonName </ErrorMessage>, 
    6  <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 

$ perl -pe 's/\n//g; s/[[:space:]]{2,}//g; s/<\/TCRMService>/$&\n/g' file3 |nl 
    1 <requestID>224</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 
    2 <requestID>615</requestID>,<ErrorMessage>The following is required: PersonName </ErrorMessage>,<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd"><RequestControl><requestID>224</requestID><DWLControl></TCRMService> 
+0

您選擇使用awk解決方案,但我想只要知道我的信息,如果這個解決方案適用於您的真實數據。 –