2015-04-02 42 views
2

免責聲明:我是一般的Python,XML和編程新手。代碼(我從互聯網上偷取)的作品,但有一些問題,我似乎無法找到答案或圍繞我的大腦...使用ElementTree解析Python XML:如何查找具有相同名稱的元素的值?

我想解析XML文件從grants.gov xml extract website與刪除所有不在「不受限制」資格類別(在XML中標記爲「EligibilityCategory」爲「99」)的贈款並輸出新的xml文件。

我有下面的代碼正確刪除不感興趣的資金哎呀,還刪除了有多個EligibilityCategorys其中還包括一個「99」的資金有機磷農藥。我認爲這是因爲.find只抓住了第一次發生的事情。我試圖使用.findall,但無法解決。預先感謝您的幫助。

import xml.etree.ElementTree as etree 
tree = etree.parse('sample.xml') 
root = tree.getroot() 

for FundingOppSynopsis in root.findall('FundingOppSynopsis'): 
    ID = int(FundingOppSynopsis.find('EligibilityCategory').text) 
    if ID != 99: 
     root.remove(FundingOppSynopsis) 

tree.write("Output/output.xml", xml_declaration=True, encoding='UTF-8', method="xml") 

樣品(顯著下跌剃)XML:

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE Grants SYSTEM "http://apply07.grants.gov/search/dtd/XMLExtract.dtd"> 
<Grants> 
    <FundingOppSynopsis> 
     <FundingOppNumber>USDA-RMA-RME-2008-03</FundingOppNumber> 
     <ApplicationsDueDate>03242008</ApplicationsDueDate> 
     <Office>Risk Management Agency</Office> 
     <Agency>Department of Agriculture</Agency> 
     <EligibilityCategory>25</EligibilityCategory> 
    </FundingOppSynopsis> 
    <FundingOppSynopsis> 
     <FundingOppNumber>NPS-ARRAWHIS100315</FundingOppNumber> 
     <ApplicationsDueDate>11282009</ApplicationsDueDate> 
     <Office>National Park Service</Office> 
     <Agency>Department of the Interior</Agency> 
     <EligibilityCategory>00</EligibilityCategory> 
    </FundingOppSynopsis> 
    <FundingOppSynopsis> 
     <FundingOppNumber>OFDA-FY08-002-APS</FundingOppNumber> 
     <ApplicationsDueDate>10102008</ApplicationsDueDate> 
     <Office>None</Office> 
     <Agency>Agency for International Development</Agency> 
     <EligibilityCategory>99</EligibilityCategory> 
    </FundingOppSynopsis> 
    <FundingOppSynopsis> 
     <FundingOppNumber>AK-NOI08-0004</FundingOppNumber> 
     <ApplicationsDueDate>07142008</ApplicationsDueDate> 
     <Office>Bureau of Land Management</Office> 
     <Agency>Department of the Interior</Agency> 
     <EligibilityCategory>99</EligibilityCategory> 
    </FundingOppSynopsis> 
    <FundingOppSynopsis> 
     <FundingOppNumber>RD-RBP-BIOMASS-2007-FULL</FundingOppNumber> 
     <ApplicationsDueDate>11162007</ApplicationsDueDate> 
     <Office>Business and Cooperative Programs</Office> 
     <Agency>Department of Agriculture</Agency> 
     <EligibilityCategory>06</EligibilityCategory> 
     <EligibilityCategory>12</EligibilityCategory> 
     <EligibilityCategory>13</EligibilityCategory> 
     <EligibilityCategory>20</EligibilityCategory> 
     <EligibilityCategory>22</EligibilityCategory> 
     <EligibilityCategory>23</EligibilityCategory> 
     <EligibilityCategory>25</EligibilityCategory> 
    </FundingOppSynopsis> 
    <FundingOppSynopsis> 
     <FundingOppNumber>BAA07-10</FundingOppNumber> 
     <ApplicationsDueDateExplanation>The due dates and times established for the receipt of White Papers and Full Proposals are as indicated in Section IV, Paragraph 3 of the BAA. </ApplicationsDueDateExplanation> 
     <Office>Office of Procurement Operations - Grants Division</Office> 
     <Agency>Department of Homeland Security</Agency> 
     <EligibilityCategory>00</EligibilityCategory> 
     <EligibilityCategory>01</EligibilityCategory> 
     <EligibilityCategory>02</EligibilityCategory> 
     <EligibilityCategory>04</EligibilityCategory> 
     <EligibilityCategory>05</EligibilityCategory> 
     <EligibilityCategory>06</EligibilityCategory> 
     <EligibilityCategory>07</EligibilityCategory> 
     <EligibilityCategory>08</EligibilityCategory> 
     <EligibilityCategory>11</EligibilityCategory> 
     <EligibilityCategory>12</EligibilityCategory> 
     <EligibilityCategory>13</EligibilityCategory> 
     <EligibilityCategory>20</EligibilityCategory> 
     <EligibilityCategory>21</EligibilityCategory> 
     <EligibilityCategory>22</EligibilityCategory> 
     <EligibilityCategory>23</EligibilityCategory> 
     <EligibilityCategory>25</EligibilityCategory> 
     <EligibilityCategory>99</EligibilityCategory> 
    </FundingOppSynopsis> 
</Grants> 

回答

1

你需要提取使用的findall類別列表,然後檢查99是在該列表中。您可以使用這樣的list comprehension

for FundingOppSynopsis in root.findall('FundingOppSynopsis'): 
    IDs = [int(category.text) for category in FundingOppSynopsis.findall('EligibilityCategory')] 
    if 99 not in IDs: 
     root.remove(FundingOppSynopsis) 
+0

謝謝你的幫助!我選擇這個答案,因爲我仍然是非常新的python,這個答案完全符合文件輸出的tree.write方法。 – 2015-04-03 02:30:30

2

你可以使用一個XPATH要求達到你想要做什麼。

import xml.etree.ElementTree as etree 
tree = etree.parse('sample.xml') 
root = tree.getroot() 

req = tree.findall("./FundingOppSynopsis[EligibilityCategory='99']") 

for r in req: 
    print r 

我做回誰擁有孩子的文檔的所有FundingOppSynopsis元素的列表請求標籤包含文本「99」 EligibilityCategory。有關XPath請求here

更多信息。大約在Python here XPATH使用

更多信息。

+0

啊!我試圖這樣做,但不是完全寫出「FundingOppSynopsis」,而是使用「*」。我確定在那裏有其他語法錯誤,但我感到沮喪,刪除它,並從頭開始。謝謝! – 2015-04-03 02:26:29

相關問題