Scrapy：跟隨鏈接獲取額外的項目數據？

我沒有一個特定的代碼問題我只是不知道如何在Scrapy框架後面處理以下問題：Scrapy：跟隨鏈接獲取額外的項目數據？

我想刮的數據結構通常是每個項目的表格行。直截了當，對吧？

最後，我想刮標題，截止日期，並詳細每一行。標題和截止日期立即在頁面上可用...

但詳細本身不在表中 - 而是包含細節的網頁的鏈接（如果沒有按「T在這裏做的意義是一個表）：

|-------------------------------------------------| 
|    Title    | Due Date | 
|-------------------------------------------------| 
| Job Title (Clickable Link)  | 1/1/2012 | 
| Other Job (Link)    | 3/2/2012 | 
|--------------------------------|----------------|

恐怕我仍然不知道如何後勤周圍的回調，並要求通過該項目，甚至通過Scrapy的CrawlSpider部分看完之後文檔。

來源

2012-02-17 dru

請首先閱讀docs以瞭解我所說的。

答案：

要刮附加字段這是在其他頁面上，與其他信息頁面的解析方法提取URL，創建並從解析方法的Request對象與URL返回，並通過已通過參數meta提取數據。

how do i merge results from target page to current page in scrapy?

來源

2012-02-18 10:35:16 warvariuc

是否有一個基本的例子代碼的地方？ – fortuneRice 2013-10-22 07:15:06

@fortuneRice，不知道如果示例是最新的：http://stackoverflow.com/questions/11150053 http://stackoverflow.com/questions/13910357/how-can-i-use-multiple-requests-and -pass-items-in-them-in-scrapy-python/13911764＃13911764 – warvariuc 2013-10-22 07:26:07

這是文檔的相關部分：http://doc.scrapy.org/en/latest/topics/spiders.html – tback 2014-03-10 16:37:51

您也可以使用Python functools.partial通過額外的參數下一個Scrapy回調傳遞一個item或任何其他可序列化的數據。

喜歡的東西：

import functools 

# Inside your Spider class: 

def parse(self, response): 
    # ... 
    # Process the first response here, populate item and next_url. 
    # ... 
    callback = functools.partial(self.parse_next, item, someotherarg) 
    return Request(next_url, callback=callback) 

def parse_next(self, item, someotherarg, response): 
    # ... 
    # Process the second response here. 
    # ... 
    return item

來源

2014-02-25 10:43:45

一個例子來自scrapy documentation

def parse_page1(self, response): 
    item = MyItem() 
    item['main_url'] = response.url 
    request = scrapy.Request("http://www.example.com/some_page.html", 
        callback=self.parse_page2) 
    request.meta['item'] = item 
    return request 

def parse_page2(self, response): 
    item = response.meta['item'] 
    item['other_url'] = response.url 
    return item

來源

2014-12-10 00:50:48 Chitrasen

Scrapy：跟隨鏈接獲取額外的項目數據？

回答

相關問題