2014-05-23 109 views
-2

我有如下縮短,當然還有反覆的XML文件標記的敵人和:Python腳本來轉換XML到csv

<file version=3.6 xmlns:xsi="http://ww.w3.org/2009/XMLSchemainstance"> 
    <Date>2014-05-12</Date> 
<creationTime>2014-05-12 :56:54</creationTime> 
<location>http://www.w.org/2009/XMLSchemainstance/output/official/.20140512.PNL.xml.gz</location> 
<contentType>nnn</contentType> 
<signOffBy>gft_test_fo</signOffBy> 
<signOffGroup>BRFPOOLNEW_SO</signOffGroup> 
<book> 
    <riskBook>BRFPOOL</riskBook> 
    <trade> 
     <tradeId>00000000000009752</tradeId> 
    <subTrade> 
     <riskTrade>00000000000009752</riskTrade> 
     <riskProductType>BOND_NF</riskProductType> 
     <reportCollection> 
     <report> 
     <valuationSource>RISK_ENGINE</valuationSource> 
     <reportName>BRZ_HGS_PPTCC</reportName> 
     <riskPoint> 
      <value>0.00</value> 
      <valueCcy>BRL</valueCcy> 
      </riskPoint> 
     </report> 
     <report> 
     <valuationSource>RISK_ENGINE</valuationSource> 
     <reportName>BRZ_HGS_PPTCC</reportName> 
     <riskPoint> 
      <value>0.00</value> 
      <valueCcy>BRL</valueCcy> 
      </riskPoint> 
     </report>   
     </reportCollection> 
     </subTrade> 
    </trade> 
    </book> 
</file> 

我想輸出爲CSV如下:

Date,creationTime,location,contentType,signOffBy,signOffGroup,riskBook,tradeId,riskTrade,riskProductType,reportName,valuationSource,reportName,value,valueCcy 
2014-05-12,2014-05-12 :56:54,http://ww.w3.org/2009/XMLSchemainstance/output/official/GLOBAL/GLOBAL_EM/BRFPOOL.20140512.PNL.xml.gz,nnn,gft_test_fo,BRFPOOLNEW_SO,BRFPOOL,00000000000009752,00000000000009752,BOND_NF,RISK_ENGINE,BRZ_HGS_PPTCC,0.00,BRL 
2014-05-12,2014-05-12 :56:54,http://ww.w3.org/2009/XMLSchemainstance/output/official/GLOBAL/GLOBAL_EM/BRFPOOL.20140512.PNL.xml.gz,PNL,gft_test_fo,BRFPOOLNEW_SO,BRFPOOL,00000000000009752,00000000000009752,BOND_NF,RISK_ENGINE,BRZ_HGS_PPTCC,0.00,BRL 

這裏是我到目前爲止的代碼:

import xml.etree.ElementTree as etree 
root=etree.parse('./emp.xml').getroot() 
for b in zip(root.findall("book/trade/tradeId"),root.findall ("book/trade/subTrade/riskTrade"),root.findall("book/trade/subTrade/riskProductType"),root.findall("book/trade/subTrade/reportcollectin/report/valuationSource"),("book/trade/subTrade/reportcollectin/report/reportName"),("book/trade/subTrade/reportcollectin/report/refCurve"),("book/trade/subTrade/reportcollectin/report/riskPoint/value"),("book/trade/subTrade/reportcollectin/report/riskPoint/valueCcy") 
    print (",".join([x.text for x in b])) 

我沒有得到我期望的輸出,請幫助我。

+1

上面的代碼會發生什麼?你看到一個錯誤? – shaktimaan

+0

@shaktimaan我沒有得到預期的輸出 – user3669149

+0

請修復您的縮進並將其標記爲代碼 – DAXaholic

回答

2

除了在XML中的錯誤(有上<creationTime><file>沒有結束標記)和Python的文件(有文件名沒有結束報價和一些路徑路由拼錯像reportcollectin),你不能使用zip功能當涉及兩個不同大小的列表時,結果始終是較低的長度,並且在搜索root.findall("book/trade/subTrade/reportCollection/report/refCurve")的代碼中,這是一個空列表,最終結果也以空列表結尾。

最好的方法是首先獲取主變量(日期,creationTime,creationTime),然後使用循環遍歷書籍和報告。

+0

請發表您的代碼 – user3669149

+0

請爲我提供一個建議的Python代碼,我對python非常陌生。 – user3669149