Scrapy只抓取一頁

這是我的代碼。蜘蛛不抓取網址或不提取它們或類似的東西。如果我但在「啓動網址」目標網址然後scrapy發現項目，但不會向前爬行，如果我但「啓動網址」包含目標列表的url，那麼結果爲0。 :)我希望文本不混淆Scrapy只抓取一頁

from scrapy.spiders import Spider 
from testing.items import TestingItem 
import scrapy 

class MySpider(scrapy.Spider): 
    name   = 'testing' 
    allowed_domains = ['http://somewebsite.com'] 
    start_urls  = ['http://somewebsite.com/listings.php'] 


    def parse(self, response): 
     for href in response.xpath('//h5/a/@href'): 
      full_url = response.urljoin(href.extract()) 
      yield scrapy.Request(full_url, callback=self.parse_item) 


    def parse_item(self, response): 
    titles = response.xpath('//*[@class="panel-content user-info"]').extract() 
    for title in titles: 
     item = TestingItem() 
     item["nimi"] = response.xpath('//*[@class="seller-info"]/h3/text()').extract() 

     yield item

來源

2017-04-27 Thé Generous

嘗試刪除allowed_domains中的'http：//' –

尼斯，tanx伴侶:)。你知道我需要添加什麼來分頁到下一頁嗎？ :) –

您需要刪除http://在allowed_domains。

要回答您的意見，對於pagination，您可以使用Rules，我會讓你檢查文檔here。它可以讓你輕鬆瀏覽分頁。

小爲例：

rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('xpath/to/nextpage/button',)), callback="parse", follow= True),)

希望這有助於。

來源

2017-04-27 08:42:22

不錯！這是完美的人！ Tanx很多！ :) –

Scrapy只抓取一頁

回答

相關問題