0
我正在迭代地爲單個ID刮兩頁。第一個刮刀適用於所有身份證件,但第二個只適用於一個身份證件。Scrapy颳了一頁'N'次,但在循環中的其他單次時間
class MySpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["example.com"]
start_urls = ['http://example.com/viewData']
def parse(self, response):
ids = ['1', '2', '3']
for id in ids:
# The following method scraps for all id's
yield scrapy.Form.Request.from_response(response,
...
callback=self.parse1)
# The following method scrapes only for 1st id
yield Request(url="http://example.com/viewSomeOtherData",
callback=self.intermediateMethod)
def parse1(self, response):
# Data scraped here using selectors
def intermediateMethod(self, response):
yield scrapy.FormRequest.from_response(response,
...
callback=self.parse2)
def parse2(self, response):
# Some other data scraped here
我想放棄了一個ID兩個不同的頁面。
Scrapy有一個重複的URL過濾器,可能這是篩選您的請求。嘗試在'callback ='後面添加'dont_filter = True'。 – Steve
非常感謝。添加dont_filter解決了我的問題。 –