1
我有一個遞歸爬行網站下面的腳本:NameError:名字「規則」是不是在Python scrapy定義
#!/usr/bin/python
import scrapy
from scrapy.selector import Selector
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
class GivenSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/",
# "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
# "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
rules = (Rule(LinkExtractor(allow=r'/'), callback=parse, follow=True),)
def parse(self, response):
select = Selector(response)
titles = select.xpath('//a[@class="listinglink"]/text()').extract()
print ' [*] Start crawling at %s ' % response.url
for title in titles:
print '\t %s' % title
#configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runner = CrawlerRunner()
d = runner.crawl(GivenSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run()
當我調用它:
$ python spide.py
NameError: name 'Rule' is not defined
你從來沒有進口'Rule'導入
Rule
類? – M4rtini好吧,規則沒有在代碼中定義,所以有什麼問題? –
如何使用Rule來遞歸掃描整個站點? – MLSC