2017-07-10 17 views
0

我有一個XML文件,它有很多元素。我想創建一個具有特定元素名稱的所有值的列表/數組,在我的情況下「pair:ApplicationNumber」。使用Python在XML中從特定元素創建值的數組

我已經去了很多其他問題,但我無法找到答案。我知道我可以通過加載文本文件並使用熊貓來完成此操作,但是我確信有更好的方法。

我是不成功的嘗試ElementTree的以及使用xml.dom的minidom命名

我的代碼目前看起來如下:

import os 
from xml.dom import minidom 
WindowsUser = os.getenv('username') 
XMLPath = os.path.join('C:\\Users', WindowsUser, 'Downloads', 'ApplicationsByCustomerNumber.xml') 
xmldoc = minidom.parse(XMLPath) 
itemlist = xmldoc.getElementsByTagName('pair:ApplicationNumber') 
for s in itemlist: 
    print(s.attributes['pair:ApplicationNumber'].value) 

一個例子XML文件看起來如下:

<?xml version="1.0" encoding="UTF-8"?> 
<pair:PatentApplicationList xsi:schemaLocation="urn:us:gov:uspto:pair PatentApplicationList.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pair="urn:us:gov:uspto:pair"> 
    <pair:FileHeader> 
      <pair:FileCreationTimeStamp>2017-07-10T10:52:12.12</pair:FileCreationTimeStamp> 
    </pair:FileHeader> 
    <pair:ApplicationStatusData> 
     <pair:ApplicationNumber>62383607</pair:ApplicationNumber> 
     <pair:ApplicationStatusCode>20</pair:ApplicationStatusCode> 
     <pair:ApplicationStatusText>Application Dispatched from Preexam, Not Yet Docketed</pair:ApplicationStatusText> 
     <pair:ApplicationStatusDate>2016-09-16</pair:ApplicationStatusDate> 
     <pair:AttorneyDocketNumber>1354-T-02-US</pair:AttorneyDocketNumber> 
     <pair:FilingDate>2016-09-06</pair:FilingDate> 
     <pair:LastModifiedTimestamp>2017-05-30T21:40:37.37</pair:LastModifiedTimestamp> 
     <pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction> 
      <pair:LastTransactionDate>2017-05-30</pair:LastTransactionDate> 
      <pair:LastTransactionDescription>Email Notification</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction> 
     <pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator> 
    </pair:ApplicationStatusData> 
    <pair:ApplicationStatusData> 
     <pair:ApplicationNumber>62292372</pair:ApplicationNumber> 
     <pair:ApplicationStatusCode>160</pair:ApplicationStatusCode> 
     <pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText> 
     <pair:ApplicationStatusDate>2016-11-01</pair:ApplicationStatusDate> 
     <pair:AttorneyDocketNumber>681-S-23-US</pair:AttorneyDocketNumber> 
     <pair:FilingDate>2016-02-08</pair:FilingDate> 
     <pair:LastModifiedTimestamp>2017-06-20T21:59:26.26</pair:LastModifiedTimestamp> 
     <pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction> 
      <pair:LastTransactionDate>2017-06-20</pair:LastTransactionDate> 
      <pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction> 
     <pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator> 
    </pair:ApplicationStatusData> 
    <pair:ApplicationStatusData> 
     <pair:ApplicationNumber>62289245</pair:ApplicationNumber> 
     <pair:ApplicationStatusCode>160</pair:ApplicationStatusCode> 
     <pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText> 
     <pair:ApplicationStatusDate>2016-10-26</pair:ApplicationStatusDate> 
     <pair:AttorneyDocketNumber>1526-P-01-US</pair:AttorneyDocketNumber> 
     <pair:FilingDate>2016-01-31</pair:FilingDate> 
     <pair:LastModifiedTimestamp>2017-06-15T21:24:13.13</pair:LastModifiedTimestamp> 
     <pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction> 
      <pair:LastTransactionDate>2017-06-15</pair:LastTransactionDate> 
      <pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction> 
     <pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator> 
    </pair:ApplicationStatusData> 
</pair:PatentApplicationList> 

回答

1

您示例中的XML根據您使用的模式擴展了標籤的「pair:」部分,所以它不匹配'pair:ApplicationNumber',即使它看起來應該如此。

我用元素樹如下(我剛剛使用的本地XML文件在我的例子,而不是在你的代碼的完整路徑)

例1中提取申請號:

from xml.etree import ElementTree 

tree = ElementTree.parse('ApplicationsByCustomerNumber.xml') 
root = tree.getroot() 

for item in root: 
    if 'ApplicationStatusData' in item.tag: 
     for child in item: 
      if 'ApplicationNumber' in child.tag: 
       print child.text 

實施例2:

from xml.etree import ElementTree 

tree = ElementTree.parse('ApplicationsByCustomerNumber.xml') 
root = tree.getroot() 

for item in root.iter('{urn:us:gov:uspto:pair}ApplicationStatusData'): 
    for child in item.iter('{urn:us:gov:uspto:pair}ApplicationNumber'): 
     print child.text 

希望這可能是有用的。

相關問題