我有一個Flask應用程序,它從url中檢索XML文檔並對其進行處理。我使用redis的requests_cache來避免額外的請求,ElementTree.iterparse遍歷流媒體內容。下面是我的代碼示例(相同的結果,無論從開發服務器和交互式解釋時):帶有流式處理和緩存請求的ElementTree.iterparse拋出ParseError
>>> import requests, requests_cache
>>> import xml.etree.ElementTree as ET
>>> requests_cache.install_cache('test', backend='redis', expire_after=300)
>>> url = 'http://myanimelist.net/malappinfo.php?u=doomcat55&status=all&type=anime'
>>> response = requests.get(url, stream=True)
>>> for event, node in ET.iterparse(response.raw):
... print(node.tag)
運行上面的代碼中,一旦拋出一個ParseError:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 1301, in __next__
self._root = self._parser._close_and_return_root()
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 1236, in _close_and_return_root
root = self._parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
但是,運行完全相同的在緩存過期之前再次執行代碼實際上會打印預期的結果! XML解析如何僅在第一次失敗,我該如何解決它?
編輯: 如果它是有幫助的,我已經注意到,運行相同的代碼,而無需在不同的ParseError緩存結果:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 1289, in __next__
for event in self._parser.read_events():
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 1272, in read_events
raise event
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 1230, in feed
self._parser.feed(data)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0
有趣的是,如果你會做'ET.iterparse(StringIO的(response.text))'相反,它會工作所有的時間,但我猜你有在這種情況下使用'.raw'的原因。 – alecxe
@alecxe嗯,這似乎暗示對我來說,這個問題是由於ET試圖解析未完全加載的文檔而引起的......我確信有可能這樣做:http:// stackoverflow.com/questions/18308529/python-requests-package-handling-xml-response – Noah
@alecxe,第一次運行緩存會消耗數據,而不是緩存意味着您傳遞的是ettt無法解析的gzipip數據 –