-1
我有一個腳本(或多或少)檢查網站上的鏈接,它工作正常,但只要鏈接所在的源網址不是200響應它將退出,我只是想讓它跳到下一個或者給出一些消息「錯誤」,或者甚至更好地給我回http狀態碼。我需要一個快速的解決方案,如果有人能幫助我,這將是超真棒:)的包含一個鏈接到一個特定的頁面的網站源URL不是狀態時的腳本STOPS狀態200
URLs.csv
=列表
domain.com
=域名中要檢查其是否鏈接有或不是,如果是的話,它的位置大致在哪裏。
import csv
from lxml import html
with open('URLs.csv', 'r') as csvfile:
urls = [row[0] for row in csv.reader(csvfile)]
for url in urls:
print url
doc = html.parse(url)
if doc.xpath('//a[contains(@href,"domain.com")]'):
for anchor_node in doc.xpath('//a[contains(@href,"finanzen.de")]'):
if anchor_node.xpath('./ancestor::div[contains(@class, "sidebar")]'):
print 'Sidebar'
elif anchor_node.xpath('./parent::div[contains(@class, "widget")]'):
print 'Sidebar'
elif anchor_node.xpath('./ancestor::div[contains(@id, "sidebar")]'):
print 'Sidebar'
elif anchor_node.xpath('./ancestor::div[contains(@class, "comment")]'):
print 'Kommentar'
elif anchor_node.xpath('./ancestor::div[contains(@id, "comment")]'):
print 'Kommentar'
elif anchor_node.xpath('./ancestor::div[contains(@class, "foot")]'):
print "Footer"
elif anchor_node.xpath('./ancestor::div[contains(@id, "foot")]'):
print "Footer"
elif anchor_node.xpath('./ancestor::div[contains(@class, "post")]'):
print "Contextual"
else:
print 'Unidentified Link'
else:
print 'Link is Dead'
Python的外殼
Python 2.7.4 (default, Apr 6 2013, 19:55:15) [MSC v.1500 64 bit (AMD64)]
Type "help", "copyright", "credits" or "license" for more information.
[evaluate Linkidentifizierung.py]
http://urlnotworking.com/broken.html
Rückverfolgung (innerste zuletzt):
File "C:\Program Files (x86)\Wing IDE 101 4.1\src\debug\tserver\_sandbox.py", line 11, in <module>
File "C:\Python27\Lib\site-packages\lxml\html\__init__.py", line 735, in parse
return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
File "C:\Python27\Lib\site-packages\lxml\etree.pyd", line 3197, in lxml.etree.parse (src\lxml\lxml.etree.c:64726)
H‹GH‹ÏÿP0H…ÛtHƒÿu
File "C:\Python27\Lib\site-packages\lxml\etree.pyd", line 1571, in lxml.etree._parseDocument (src\lxml\lxml.etree.c:92363)
File "C:\Python27\Lib\site-packages\lxml\etree.pyd", line 1600, in lxml.etree._parseDocumentFromURL (src\lxml\lxml.etree.c:92647)
File "C:\Python27\Lib\site-packages\lxml\etree.pyd", line 1500, in lxml.etree._parseDocFromFile (src\lxml\lxml.etree.c:91710)
File "C:\Python27\Lib\site-packages\lxml\etree.pyd", line 1047, in lxml.etree._BaseParser._parseDocFromFile (src\lxml\lxml.etree.c:88610)
File "C:\Python27\Lib\site-packages\lxml\etree.pyd", line 577, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:84019)
File "C:\Python27\Lib\site-packages\lxml\etree.pyd", line 676, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:85122)
File "C:\Python27\Lib\site-packages\lxml\etree.pyd", line 614, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:84417)
IOError: Error reading file 'http://urlnotworking.com/broken.html': failed to load HTTP resource
這可能是[例外](http://docs.python.org/3/教程/ errors.html#處理的例外)。 – ejno
我不得不修理你的縮進;你可以請確認一切正確嗎? –
謝謝,即時通訊在這裏仍然很新:)但我的問題仍然存在,只要每個URL似乎都有一個200響應,它就會工作,如果不是,它將生成如上所示的python-shell。 – eLudium