使用Python和libxml2根據xml中的標籤屬性匹配兄弟012

我是編程新手，因此可能缺乏某些基礎知識。使用Python和libxml2根據xml中的標籤屬性匹配兄弟012

我有一個xml：

<mother> 
<daughter nr='1' state='nice' name='Ada'> 
<daughter nr='2' state='naughty' name='Beta'> 
<daughter nr='3' state='nice' name='Cecilia'> 
<daughter nr='4' state='neither' name='Dora'> 
<daughter nr='5' state='naughty' name='Elis'> 
</mother>

我需要的是根據自己的號碼（尼斯和她最接近的頑皮一個）來匹配調皮和漂亮的女兒和打印對：

Ada Beta 
Cecilia Elis

我的代碼：

import libxml2, sys 

doc = libxml2.parseFile("file.xml") 
tree = doc.xpathNewContext() 

nice = tree.xpathEval("//daugter[@state='nice']") 

for l in nice: 
    print l.prop("name") 

nice_nr = [] 
for n in nice: 
    nice_nr.append(n.prop("nr")) 

# and the same for the naugty daugters 

doc.freeDoc()

所以我能夠得到他們的屬性值，但我不能弄清楚如何製作它們。
我能找到的是Xpath的'follow-sibling'軸，但是從所有可以找到的例子我都不確定它是否可以在這裏使用。語法相當不同，它需要以下所有的兄弟姐妹。任何幫助表示讚賞。

來源

2011-03-13 zufanka

好問題，+1。查看我的答案，獲取完整，簡短且容易的XPath解決方案。 :) – 2011-03-13 02:34:38

使用：

/*/daughter[@state = 'nice'][1] 
| 
/*/daughter[@state = 'nice'][1] 
     /following-sibling::daughter[@state='naughty'] [1]

這將選擇對首好聽的女兒和其最近頑皮的女兒。

選擇第二對這樣的使用：

/*/daughter[@state = 'nice'][2] 
| 
/*/daughter[@state = 'nice'][2] 
     /following-sibling::daughter[@state='naughty'] [1]

...等

請注意，這些表達式不保證節點將在所有被選中 - 可能不存在daughter元素，或者不是每個漂亮的元素都可能具有以下兄弟元素daughter頑皮。

如果它保證了順序daughter元件嚴格（'nice'，'naughty）在文檔中，則可以使用一個非常簡單的XPath表達式來獲取所有對：

/* /女兒[ @state =「好」或@state =「淘氣」]

這裏選擇是頂部元件的兒童和具有交替的與狀態值屬性的所有daughter元素：nice, naughty, nice, naughty, ...

如果使用的XPath API獲取這些對象數組，那麼對於每個即使k，這對女兒都在此數組的第k和第（k + 1）個成員中。

來源

2011-03-13 02:34:05

謝謝你的答案。其實每個漂亮的女兒都有一個淘氣的我需要經歷的大量數據:)你看到這讓我對'下一個兄弟姐妹'軸感到困惑。在複製/粘貼你的代碼後，我在第一行發現語法錯誤（/ * /） – zufanka 2011-03-13 10:56:26

^（忘記在評論中提及） – zufanka 2011-03-13 11:44:22

@zufanka：感謝您的觀察 - 「 nice'' - 現在修好了。 – 2011-03-13 15:26:53

每個XPath表達式都會返回一個有序節點列表。只是壓縮列表一起找到相應的對：

xpath = lambda state: tree.xpathEval("//daughter[@state='%s']" % state) 
for nodes in zip(xpath('nice'), xpath('naughty')): 
    print ' '.join(n.prop('name') for n in nodes)

以上，xpath是評估XPath表達式返回的女兒匹配給定state的功能。然後將這兩個列表傳遞給zip，這將返回每個列表中第i個元素的元組。

如果子節點按順序排列在XML文件中，則可以在將節點傳遞給zip之前，按照nr屬性對節點進行排序。

來源

2011-03-13 02:34:14 samplebias

謝謝你的回答。這實際上適用於我給出的示例，但不在我的代碼中。我懷疑問題在於，雖然其中一個子元素使用「state ='nice'」，但另一個元素的狀態是「starts-with（@ state，'nau'）」 - >，因爲我需要更多的值比賽。我得到一個libxml2.xpathError：xmlXPathEval（）失敗。我嘗試重新編寫代碼，如（「//女兒[％s]」％state）和（xpath（'@ state =「nice」'），xpath（'starts-with（@state，「nau」）） '）），但它不工作 – zufanka 2011-03-13 10:53:45

^（忘記在評論中提到） – zufanka 2011-03-13 11:43:39

我有一個沒有xpath的解決方案。女孩的數量排序也被考慮在內。該文檔僅遍歷一次。

from lxml.etree import fromstring 

data = """the-xml-above""" 

def fetch_sorted_daughters(data): 
    # load data into xml document 
    doc = fromstring(data) 
    nice = [] 
    naughty = [] 

    # extract into doubles - number, name 
    for subelement in doc: 
     if subelement.tag=='daughter': 
      nr = subelement.get('nr') 
      name = subelement.get('name') 
      if subelement.get('state')=='nice': 
       nice.append((nr, name)) 
      if subelement.get('state')=='naughty': 
       naughty.append((nr, name)) 
    del doC# release document 

    # sort doubles 
    nice.sort(key=lambda x:x[0]) 
    naughty.sort(key=lambda x:x[0]) 

    # get sorted names from doubles 
    nice = tuple([double[1] for double in nice]) 
    naughty = tuple([double[1] for double in naughty]) 

    return nice, naughty 

nice, naughty = fetch_sorted_daughters(data) 
pairs = zip(nice, naughty) 

print pairs

來源

2011-03-13 10:24:31 vonPetrushev

非常感謝您的答案。我會最高興，如果我不必使用xpath :)但我需要一個通配符在我的代碼中的一個案例中（假設對於'淘氣'的女兒來說，state屬性的值必須以'nau'開始以匹配其他女兒），並且我沒有找到任何解決方案lxml.etree。 'substring'或'startswith'在那裏不起作用。 – zufanka 2011-03-13 11:00:23

^（在評論中忘記提及） – zufanka 2011-03-13 11:44:03

使用Python和libxml2根據xml中的標籤屬性匹配兄弟012

回答

相關問題