5
該文檔說我可以:我可以在Python 3上提供lxml.etree.parse的URL嗎?
lxml可以從本地文件,HTTP URL或FTP URL解析。它也 自動檢測並讀取gzip壓縮的XML文件(.gz)。
(從下「解析器」 http://lxml.de/parsing.html)
,但快速的實驗似乎另有暗示:
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> parser = etree.HTMLParser()
>>> from urllib.request import urlopen
>>> with urlopen('https://pypi.python.org/simple') as f:
... tree = etree.parse(f, parser)
...
>>> tree2 = etree.parse('https://pypi.python.org/simple', parser)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 3299, in lxml.etree.parse (src\lxml\lxml.etree.c:72655)
File "parser.pxi", line 1791, in lxml.etree._parseDocument (src\lxml\lxml.etree.c:106263)
File "parser.pxi", line 1817, in lxml.etree._parseDocumentFromURL (src\lxml\lxml.etree.c:106564)
File "parser.pxi", line 1721, in lxml.etree._parseDocFromFile (src\lxml\lxml.etree.c:105561)
File "parser.pxi", line 1122, in lxml.etree._BaseParser._parseDocFromFile (src\lxml\lxml.etree.c:100456)
File "parser.pxi", line 580, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:94543)
File "parser.pxi", line 690, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:96003)
File "parser.pxi", line 618, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:95015)
OSError: Error reading file 'https://pypi.python.org/simple': failed to load external entity "https://pypi.python.org/simple"
>>>
我可以使用的urlopen方法,但文檔似乎暗示傳遞URL以某種方式更好。另外,如果文檔不準確,我有點擔心依賴lxml,特別是如果我開始需要做更復雜的事情。
什麼是從一個已知的URL解析HTML與lxml的正確方法?我應該在哪裏查看記錄?
更新:如果我使用http
URL而不是https
之一,則會得到相同的錯誤。
它的工作原理爲** ** HTTP URL,而不是HTTPS。 – isedev 2014-10-02 14:39:34
不,http也失敗了,同樣的錯誤。對不起,我應該說(儘管不支持HTTPS使得使用URL的能力有點不安全:-() – 2014-10-02 15:08:57
例如嘗試使用「www.google.com」,但它適用於我。 – isedev 2014-10-02 15:14:18