4
我想傳遞一個參數在scrapy crawl ...
命令行的規則定義將用於在擴展CrawlSpider,像下面如何在scrapy中訪問crawlspider中的命令行參數?
name = 'example.com'
allowed_domains = ['example.com']
start_urls = ['http://www.example.com']
rules = (
# Extract links matching 'category.php' (but not matching 'subsection.php')
# and follow links from them (since no callback means follow=True by default).
Rule(SgmlLinkExtractor(allow=('category\.php',), deny=('subsection\.php',))),
# Extract links matching 'item.php' and parse them with the spider's method parse_item
Rule(SgmlLinkExtractor(allow=('item\.php',)), callback='parse_item'),
)
我想的是,在SgmlLinkExtractor所述允許屬性在命令行參數指定。 我已經搜索了一下,發現我可以在spider的__init__
方法中獲得參數值,但是如何才能在規則定義中使用命令行中的參數?