LXML Xpath似乎沒有返回完整路徑

好吧我會第一個承認它是，只是不是我想要的路徑，我不知道如何得到它。LXML Xpath似乎沒有返回完整路徑

我在Eclipse中使用Python 3.3與Pydev插件在Windows 7在工作和Ubuntu 13.04在家裏。我是python新手，編程經驗有限。

我試圖編寫一個腳本來接受XML Lloyds市場保險消息，找到所有標籤並將它們轉儲到.csv中，我們可以輕鬆更新它們，然後重新導入它們以創建更新的xml。

我已經設法做到了這一切，除了當我得到所有的標籤，它只給出標籤名稱，而不是它上面的標籤。

<TechAccount Sender="broker" Receiver="insurer"> 
<UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId> 
<BrokerReference>HOY123/456</BrokerReference> 
<ServiceProviderReference>2012080921401A1</ServiceProviderReference> 
<CreationDate>2012-08-10</CreationDate> 
<AccountTransactionType>premium</AccountTransactionType> 
<GroupReference>2012080921401A1</GroupReference> 
<ItemsInGroupTotal> 
<Count>1</Count> 
</ItemsInGroupTotal> 
<ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference> 
<ServiceProviderGroupItemsTotal> 
<Count>13</Count> 
</ServiceProviderGroupItemsTotal>

這是XML的一個片段。我想要的是找到所有的標籤和他們的路徑。例如，我想將其顯示爲ItemsInGroupTotal/Count，但只能將其作爲Count計算。

這裏是我的代碼：

xml = etree.parse(fullpath) 
print(xml.xpath('.//*')) 
all_xpath = xml.xpath('.//*') 
every_tag = [] 
for i in all_xpath: 
    single_tag = '%s,%s' % (i.tag, i.text) 
    every_tag.append(single_tag) 
print(every_tag)

這給：

'{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupReference,8-2012-08-10', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupItemsTotal,\n', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}Count,13',

正如你所看到的計數顯示爲{命名空間}計數，13而不是{命名空間} ItemsInGroupTotal /計數，13

任何人都可以指向我需要的東西嗎？

謝謝（希望我的第一篇文章是OK）

亞當

編輯：

這是我的代碼現在：開放（FULLPATH， 'RB'）作爲xmlFilepath： XMLFILE = xmlFilepath.read（）

fulltext = '%s' % xmlfile 
text = fulltext[2:] 
print(text) 


xml = etree.fromstring(fulltext) 
tree = etree.ElementTree(xml) 

every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()] 
print(every_tag)

但這返回一個錯誤： ValueError異常：Unicode字符串與ENCOD不支持聲明。請不要聲明使用字節輸入或XML片段。

我刪除了前兩個字符作爲你是B」，並抱怨它沒有用標籤開始

更新：

我一直在玩這個周圍，如果我刪除了紅雙喜： xxx標籤和命名空間的東西在頂部按預期工作。我需要保留xis標籤並能夠將它們識別爲xis標籤，因此不能只刪除它們。

任何幫助我如何實現這一目標？

來源

2013-07-09 user2565150

ElementTree objects have a method getpath(element), which returns a structural, absolute XPath expression to find that element

在iter()循環中調用每個元素getpath應該爲你工作：

from pprint import pprint 
from lxml import etree 


text = """ 
<TechAccount Sender="broker" Receiver="insurer"> 
    <UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId> 
    <BrokerReference>HOY123/456</BrokerReference> 
    <ServiceProviderReference>2012080921401A1</ServiceProviderReference> 
    <CreationDate>2012-08-10</CreationDate> 
    <AccountTransactionType>premium</AccountTransactionType> 
    <GroupReference>2012080921401A1</GroupReference> 
    <ItemsInGroupTotal> 
     <Count>1</Count> 
    </ItemsInGroupTotal> 
    <ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference> 
    <ServiceProviderGroupItemsTotal> 
     <Count>13</Count> 
    </ServiceProviderGroupItemsTotal> 
</TechAccount> 
""" 

xml = etree.fromstring(text) 
tree = etree.ElementTree(xml) 

every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()] 
pprint(every_tag)

打印：

['/TechAccount, \n', 
'/TechAccount/UUId, 2EF40080-F618-4FF7-833C-A34EA6A57B73', 
'/TechAccount/BrokerReference, HOY123/456', 
'/TechAccount/ServiceProviderReference, 2012080921401A1', 
'/TechAccount/CreationDate, 2012-08-10', 
'/TechAccount/AccountTransactionType, premium', 
'/TechAccount/GroupReference, 2012080921401A1', 
'/TechAccount/ItemsInGroupTotal, \n', 
'/TechAccount/ItemsInGroupTotal/Count, 1', 
'/TechAccount/ServiceProviderGroupReference, 8-2012-08-10', 
'/TechAccount/ServiceProviderGroupItemsTotal, \n', 
'/TechAccount/ServiceProviderGroupItemsTotal/Count, 13']

UPD：如果你的XML數據文件test.xml中，該代碼將如下所示：

from pprint import pprint 
from lxml import etree 

xml = etree.parse('test.xml').getroot() 
tree = etree.ElementTree(xml) 

every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()] 
pprint(every_tag)

希望有所幫助。

來源

2013-07-09 22:04:48 alecxe

非常感謝這個，但我很難讓它爲我工作。我從文件中讀取XML，而不是直接將其放入文本中，我試圖將其轉換爲字符串似乎失敗。關於實現這個的任何提示？ – user2565150

當然，用'etree.parse（file_name）'替換'etree.fromstring（text）'。 – alecxe

對不起，應該說我嘗試了，得到：TypeError：參數'元素'有不正確的類型（預期lxml.etree._Element，得到lxml.etree._ElementTree） – user2565150

getpath()確實會返回不適合人類消費的xpath。從這個xpath，你可以建立一個更有用的一個。比如用這種快速和骯髒的方法：

def human_xpath(element): 
    full_xpath = element.getroottree().getpath(element) 
    xpath = '' 
    human_xpath = '' 
    for i, node in enumerate(full_xpath.split('/')[1:]): 
     xpath += '/' + node 
     element = element.xpath(xpath)[0] 
     namespace, tag = element.tag[1:].split('}', 1) 
     if element.getparent() is not None: 
      nsmap = {'ns': namespace} 
      same_name = element.getparent().xpath('./ns:' + tag, 
                namespaces=nsmap) 
      if len(same_name) > 1: 
       tag += '[{}]'.format(same_name.index(element) + 1) 
     human_xpath += '/' + tag 
    return human_xpath

來源

2013-07-30 15:42:51

LXML Xpath似乎沒有返回完整路徑

回答

相關問題