class Crawler1(object):
def __init__(self):
'constructor'
self.visited = []
self.will_visit = []
def reset(self):
'reset the visited links'
self.visited = []
self.will_visit = []
def crawl(self, url, n):
'crawl to depth n starting at url'
self.analyze(url)
if n < 0:
self.reset()
elif url in self.visted:
self.crawl(self.will_visit[-1],n-1)
else:
self.visited.append(url)
self.analyze(url)
self.visited.append(url)
self.will_visit.pop(-1)
self.crawl(self.will_visit[-1],n-1)
def analyze(self, url):
'returns the list of URLs found in the page url'
print("Visiting", url)
content = urlopen(url).read().decode()
collector = Collector(url)
collector.feed(content)
urls = collector.getLinks()
for i in urls:
if i in self.will_visit:
pass
else:
self.will_visit.append(i)
我希望這個節目通過一系列的鏈接運行,但只到「N」讓它網絡爬蟲類
我不知道什麼是錯的代碼,但我敢肯定它很多。一些提示會很好。
預期輸出如果n = 1和Site 1中有對站點2和Site3鏈接:
Visiting [Site1]
Visiting [Site2]
Visiting [Site3]
你如何運行程序,以及目前爲止你看到了什麼行爲?我猜'c = Crawler1(); c.crawl('Site1',3)'。 – Edmund
完全一樣。我得到'visting Site1',然後錯誤代碼'AttributeError:'Crawler1'對象沒有屬性'visted'' –
嗯,這可能是因爲你在「visted」中缺少'i';)之後的任何其他錯誤? – Edmund