我想從文件和輸出取URL頁面的標題:打印標題
import lxml.html
file = open('ab.txt','r')
for line in file:
t = lxml.html.parse(line)
print t.find(".//title").text
錯誤:
Traceback (most recent call last):
File "C:\Python27\site.py", line 4, in <module>
t = lxml.html.parse(line)
File "C:\Python27\lib\site-packages\lxml\html\__init__.py", line 661, in parse
return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958)
File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797)
File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080)
File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175)
File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173)
File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)
File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178)
File "parser.pxi", line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64493)
IOError: Error reading file 'http://example.com/5129860
': failed to load HTTP resource
的ab.txt有:
example.com/123
example.com/234
example.com/456
....
這裏有什麼問題嗎?
什麼是您預期的輸出?你想下載每個url的內容並打印標題嗎? – 2014-10-10 12:05:34