我試圖解析像Python的XPath的語法錯誤:無效的謂詞
<document>
<pages>
<page>
<paragraph>XBV</paragraph>
<paragraph>GHF</paragraph>
</page>
<page>
<paragraph>ash</paragraph>
<paragraph>lplp</paragraph>
</page>
</pages>
</document>
一個XML這裏是我的代碼
import xml.etree.ElementTree as ET
tree = ET.parse("../../xml/test.xml")
root = tree.getroot()
path="./pages/page/paragraph[text()='GHF']"
print root.findall(path)
,但我得到一個錯誤
print root.findall(path)
File "X:\Anaconda2\lib\xml\etree\ElementTree.py", line 390, in findall
return ElementPath.findall(self, path, namespaces)
File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 293, in findall
return list(iterfind(elem, path, namespaces))
File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 263, in iterfind
selector.append(ops[token[0]](next, token))
File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 224, in prepare_predicate
raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate
是什麼錯誤與我的xpath?
跟進
感謝falsetru,您的解決方案工作。我有一個後續。現在,我想要使用文字GHF
來獲得段落前的所有段落元素。所以在這種情況下,我只需要XBV
元素。我想忽略ash
和lplp
。我想這樣做的一種方法是
result = []
for para in root.findall('./pages/page/'):
t = para.text.encode("utf-8", "ignore")
if t == "GHF":
break
else:
result.append(para)
但是有沒有更好的方法來做到這一點?
感謝的人!我可以做些什麼像text.contains(「東西」)和text.notContains(「東西」)? – AbtPst
@AbtPst,您可以:'path =「./ pages/page/paragraph [contains(text(),'something')]」 '/'path =「./ pages/page/paragraph [not(contains文本(),'東西'))]「'' – falsetru
不,你不能'find_all' http://stackoverflow.com/questions/2637760/how-do-i-match-contents-of-an-element-in -xpath-lxml自'def prepare_predicate(next,token)'失敗 – SIslam