Python的XPath的語法錯誤：無效的謂詞

我試圖解析像Python的XPath的語法錯誤：無效的謂詞

<document> 
    <pages> 

    <page> 
     <paragraph>XBV</paragraph> 

     <paragraph>GHF</paragraph> 
    </page> 

    <page> 
     <paragraph>ash</paragraph> 

     <paragraph>lplp</paragraph> 
    </page> 

    </pages> 
</document>

一個XML這裏是我的代碼

import xml.etree.ElementTree as ET 

tree = ET.parse("../../xml/test.xml") 

root = tree.getroot() 

path="./pages/page/paragraph[text()='GHF']" 

print root.findall(path)

，但我得到一個錯誤

print root.findall(path) 
    File "X:\Anaconda2\lib\xml\etree\ElementTree.py", line 390, in findall 
    return ElementPath.findall(self, path, namespaces) 
    File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 293, in findall 
    return list(iterfind(elem, path, namespaces)) 
    File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 263, in iterfind 
    selector.append(ops[token[0]](next, token)) 
    File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 224, in prepare_predicate 
    raise SyntaxError("invalid predicate") 
SyntaxError: invalid predicate

是什麼錯誤與我的xpath？

跟進

感謝falsetru，您的解決方案工作。我有一個後續。現在，我想要使用文字GHF來獲得段落前的所有段落元素。所以在這種情況下，我只需要XBV元素。我想忽略ash和lplp。我想這樣做的一種方法是

result = [] 
for para in root.findall('./pages/page/'): 
    t = para.text.encode("utf-8", "ignore") 
    if t == "GHF": 
     break 
    else: 
     result.append(para)

但是有沒有更好的方法來做到這一點？

來源

2015-11-20 AbtPst

ElementTree's XPath support is limited.使用其他圖書館一樣lxml：

import lxml.etree 
root = lxml.etree.parse('test.xml') 

path="./pages/page/paragraph[text()='GHF']" 
print root.xpath(path)

來源

2015-11-20 15:59:13 falsetru

感謝的人！我可以做些什麼像text.contains（「東西」）和text.notContains（「東西」）？ – AbtPst

@AbtPst，您可以：'path =「./ pages/page/paragraph [contains（text（），'something'）]」 '/'path =「./ pages/page/paragraph [not（contains文本（），'東西'））]「'' – falsetru

不，你不能'find_all' http://stackoverflow.com/questions/2637760/how-do-i-match-contents-of-an-element-in -xpath-lxml自'def prepare_predicate（next，token）'失敗 – SIslam

正如@falsetru提到，ElementTree不支持text()謂詞，但它支持文本子元素匹配，所以在這個例子中，可以搜索對於具有特定文本的paragraph的page，使用路徑./pages/page[paragraph='GHF']。這裏的問題是page中有多個paragraph標籤，因此需要針對具體paragraph進行迭代。就我而言，我需要找到一個dependency的version中看到maven pom.xml，有且只有一個孩子version所以下面的工作：

In [1]: import xml.etree.ElementTree as ET 

In [2] ns = {"pom": "http://maven.apache.org/POM/4.0.0"} 

In [3] print ET.parse("pom.xml").findall(".//pom:dependencies/pom:dependency[pom:artifactId='some-artifact-with-hardcoded-version']/pom:version", ns)[0].text 
Out[1]: '1.2.3'

來源

2017-12-21 11:45:59 haridsv

Python的XPath的語法錯誤：無效的謂詞

回答

相關問題