2017-08-16 109 views
-1

給定一個帶有中文字符的XML,我想使用xml.etree來幫助我解析XML來做一些處理。英文版的作品。例如:xml.etree.ElementTree for chinese

>el.xml printf '%s\n' $'<?xml version=\'1.0\' encoding=\'utf8\'?><Color>Grey</Color>' 
>cl.xml printf '%s\n' $'<?xml version=\'1.0\' encoding=\'utf8\'?><Color>灰色</Color>' 

tryParse() { 
    python -c 'import xml.etree.ElementTree as ET; import sys; ET.parse(sys.argv[1])' "[email protected]" 
} 

tryParse el.xml && printf '%s\n\n' "English works" 
tryParse cl.xml && printf '%s\n\n' "Chinese works" 

...發射作爲輸出:

English works 

Traceback (most recent call last): 
    File "<string>", line 1, in <module> 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1182, in parse 
    tree.parse(source, parser) 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse 
    parser.feed(data) 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed 
    self._raiseerror(v) 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror 
    raise err 
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 44 
+0

你需要'ET.fromstring()','不ET.parse( )';此代碼正在尋找**文件**,其名稱中含有''。 –

+0

...這就是說 - 在這裏給出的代碼,*如果我們修復了破損的變量名稱,然後是https://stackoverflow.com/questions/21713527/xml-parsing-from-web-響應或https://stackoverflow.com/questions/3064247/cant-parse-xml-effectively-using-python,它們共享相同的基本原因。 –

+0

秒讓我改變代碼。 – bryansis2010

回答

1

使用lxml代替:

>>> import lxml.etree as ET 
>>> doc = ET.parse('cl.xml') 
>>> print doc.getroot().text 
灰色 
+0

嗨,當我檢查節點中的內容時,它是否也是漢字? – bryansis2010

+0

絕對如此。我已編輯以顯示此內容。 –

相關問題