2013-01-03 72 views
0

我想解析一個包含使用Python重複子元素的XML文檔。當我嘗試解析數據時,它會創建一個空文件。如果我註釋掉重複的子元素代碼(請參閱下面的python腳本中的粗體部分),則文檔可以正確生成。有人可以幫忙嗎?解析重複的子元素python

XML:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> 
<FRPerformance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
    <FRPerformanceShareClassCurrency> 
    <FundCode>00190</FundCode> 
    <CurrencyID>USD</CurrencyID> 
    <FundShareClassCode>A</FundShareClassCode> 
    <ReportPeriodFrequency>Quarterly</ReportPeriodFrequency> 
    <ReportPeriodEndDate>06/30/2012</ReportPeriodEndDate> 
    <Net> 
     <Annualized> 
     <Year1>-4.909000000</Year1> 
     <Year3>10.140000000</Year3> 
     <Year5>-22.250000000</Year5> 
     <Year10>-7.570000000</Year10> 
     <Year15>-4.730000000</Year15> 
     <Year20>-0.900000000</Year20> 
     <SI>1.900000000</SI> 
     </Annualized> 
    </Net> 
    <Gross> 
     <Annualized> 
     <Month3>1.279000000</Month3> 
     <YTD>7.294000000</YTD> 
     <Year1>-0.167000000</Year1> 
     <Year3>11.940000000</Year3> 
     <Year5>-21.490000000</Year5> 
     <Year10>-7.120000000</Year10> 
     <Year15>-4.420000000</Year15> 
     <Year20>-0.660000000</Year20> 
     <SI>2.110000000</SI> 
     </Annualized> 
     <Cumulative> 
     <Month1Back>2.288000000</Month1Back> 
     <Month2Back>-1.587000000</Month2Back> 
     <Month3Back>0.610000000</Month3Back> 
     <CurrentYear>7.294000000</CurrentYear> 
     <Year1Back>-2.409000000</Year1Back> 
     <Year2Back>13.804000000</Year2Back> 
     <Year3Back>20.287000000</Year3Back> 
     <Year4Back>-78.528000000</Year4Back> 
     <Year5Back>-0.101000000</Year5Back> 
     <Year6Back>9.193000000</Year6Back> 
     <Year7Back>2.659000000</Year7Back> 
     <Year8Back>9.208000000</Year8Back> 
     <Year9Back>25.916000000</Year9Back> 
     <Year10Back>-3.612000000</Year10Back> 
     </Cumulative> 
     <HistoricReturns> 
     <HistoricReturns_Item> 
      <Date>Fri, 28 Feb 1997 00:00:00 -0600</Date> 
      <Return>32058.090000000</Return> 
     </HistoricReturns_Item> 
     <HistoricReturns_Item> 
      <Date>Fri, 28 Feb 2003 00:00:00 -0600</Date> 
      <Return>36415.110000000</Return> 
     </HistoricReturns_Item> 
     <HistoricReturns_Item> 
      <Date>Fri, 29 Feb 2008 00:00:00 -0600</Date> 
      <Return>49529.290000000</Return> 
     </HistoricReturns_Item> 
     <HistoricReturns_Item> 
      <Date>Fri, 30 Apr 1993 00:00:00 -0600</Date> 
      <Return>21621.500000000</Return> 
     </HistoricReturns_Item> 
</<HistoricReturns> 

Python腳本

## Create command line arguments for XML file and tageName 
xmlFile = sys.argv[1] 
tagName = sys.argv[2] 


tree = ET.parse(xmlFile) 
root = tree.getroot() 

## Setup the file for output 
saveout = sys.stdout 
output_file = open('parsedXML.csv', 'w') 
sys.stdout = output_file 

## Parse XML 

for node in root.findall(tagName): 
    fundCode = node.find('FundCode').text 
    curr = node.find('CurrencyID').text 
    shareClass = node.find('FundShareClassCode').text 
    for node2 in node.findall('./Net/Annualized'): 
     year1 = node2.findtext('Year1') 
     year3 = node2.findtext('Year3') 
     year5 = node2.findtext('Year5') 
     year10 = node2.findtext('Year10') 
     year15 = node2.findtext('Year15') 
     year20 = node2.findtext('Year20') 
     SI = node2.findtext('SI') 
     for node3 in node.findall('./Gross'): 
      for node4 in node3.findall('./Annualized'): 
       month3 = node4.findtext('Month3') 
       ytd = node4.findtext('YTD') 
       year1g = node4.findtext('Year1') 
       year3g = node4.findtext('Year3') 
       year5g = node4.findtext('Year5') 
       year10g = node4.findtext('Year10') 
       year15g = node4.findtext('Year15') 
       year20g = node4.findtext('Year2') 
       SIg = node4.findtext('SI') 
      for node5 in node3.findall('./Cumulative'): 
       month1b = node5.findtext('Month1Back') 
       month2b = node5.findtext('Month2Back') 
       month3b = node5.findtext('Month3Back') 
       curYear = node5.findtext('CurrentYear') 
       year1b = node5.findtext('Year1Back') 
       year2b = node5.findtext('Year2Back') 
       year3b = node5.findtext('Year3Back') 
       year4b = node5.findtext('Year4Back') 
       year5b = node5.findtext('Year5Back') 
       year6b = node5.findtext('Year6Back') 
       year7b = node5.findtext('Year7Back') 
       year8b = node5.findtext('Year8Back') 
       year9b = node5.findtext('Year9Back') 
       year10b = node5.findtext('Year10Back') 
     **for node6 in node.findall('./HistoricReturns'): 
      for node7 in node6.findall('./HistoricReturns_Item'): 
       hDate = node7.findall('Date') 
       hReturn = node7.findall('Return')** 
       print(fundCode, curr, shareClass,year1, year3, year5, year10, year15, year15, year20, SI,month3, ytd, year1g, year3g, year5g, year10g, year15g, year20g, SIg, month1b, month2b, month3b, curYear, year1b, year2b, year3b, year4b, year5b, year6b, year7b, year8b,year9b,year10b, hDate, hReturn) 
+0

這是子元素(HistoricReturns/HistoricReturns_Item),這是有問題的代碼。 **對於node.findall中的node6('./ HistoricReturns'): 對於node6.findall('./ HistoricReturns_Item')中的node7: hDate = node7.findall('Date') hReturn = node7.findall('返回')** – user1943669

+2

請勿使用註釋來澄清問題。改爲編輯問題。 – mzjn

+1

編輯之後,XML仍然不完整。 Python代碼中缺少import語句。 – mzjn

回答

1

示例XML和Python代碼並不在結構方面匹配。無論是

  • 你錯過了從XML關閉</Gross>標籤(這應該是<HistoricReturns>節開始之前) - 在這種情況下,代碼是正確的或
  • 的代碼應該是for node6 in node3.findall('./HistoricReturns'):node3,而不是node

NB該XML範例是不完整的(它沒有良好的XML),因爲它缺少結束標記爲GrossFRPerformanceShareClassCurrencyFRPerformance所以這使得它無法確切地回答這個問題。希望這有助於。

+0

對不起,關於XML。其中一個因素崩潰了。我已經更新了上面的XML。 – user1943669

+1

您是否看到@ mzjn的評論?更新之後,XML仍然不完整。這是因爲並非每個起始標籤都有對應的結束標籤。 我建議運行您的示例代碼對您的示例文檔,你應該看到這一點。它還會提示您將缺少的導入添加到python代碼中。這會幫助你更容易。 – dannyclark