如何從同一個csv行的多個頁面中抓取數據？

我需要從多個頁面中抓取數據。首先它應該從第一頁上抓取數據，然後從這個頁面提取一個url到第二個頁面，並從中獲取一些數據如何從同一個csv行的多個頁面中抓取數據？

所有應該在同一個csv行上。

這是第一頁：數據存儲 https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en&l=bWFya2V0PT1nZW5lcmFsfHxzdD09MjB8fHN0cz09eyIxMCI6IlJlZ2lvbiIsIjIwIjoiTWlkZGxlIEVhc3QifQ%3D%3D

例子是在表中第一行e.g：目錄，模型，生產和系列。

這是第二個頁面：串聯，發動機，生產日期：數據存儲 https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en&l=bWFya2V0PT1nZW5lcmFsfHxzdD09MzB8fHN0cz09eyIxMCI6IlJlZ2lvbiIsIjIwIjoiTWlkZGxlIEVhc3QiLCIzMCI6IjRSVU5ORVIgNjcxMzYwIn18fGNhdGFsb2c9PTY3MTM2MHx8cmVjPT1CMw%3D%3D 例子。

都應該放在同CSV行像截圖：

這是我的代碼：

import datetime 
import urlparse 
import socket 
import scrapy 

from scrapy.loader.processors import MapCompose, Join 
from scrapy.loader import ItemLoader 
from scrapy.http import Request 

from properties.items import PropertiesItem 


class BasicSpider(scrapy.Spider): 
    name = "manual" 


    # This is the page which i will hit middle est from. 
    start_urls = ["https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en"] 


    def parse(self, response): 
     # First page 
     next_selector ="https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en&l="+response.xpath('//*[@id="rows"]/tr[2]/@onclick').re(r"HM\.set\('([^']+)'")[0] 
     yield Request(next_selector, callback=self.parse_item) 

    def parse_item(self, response): 
     for tr in response.xpath("/html/body/table[2]/tr/td/table/tr")[1:]: 
      item = PropertiesItem() 

      item['Series']= tr.xpath("td[1]/text()").extract() 
      item['Engine']= tr.xpath("td[2]/text()").extract() 
      second_selector ="https://www.catalogs.ssg.asia/toyota/?fromchanged=true&lang=en&l="+response.xpath('/html/body/table[2]/tr/td/table/tr/@onclick').re(r"HM\.set\('([^']+)'") 

      yield item 

    def parse_item_2(self, response): 
     item = PropertiesItem() 
     item['Building_Condition']=response.xpath('/html/body/table[2]/tr/td/table/tr[2]/td[1]/text()').extract() 
     yield item

我需要寫在解析項目的一些代碼去parse_item_2和處理第二頁，並得到結果在同一個csv行。如何做到這一點？

來源

2017-03-25 Hat hout

如果要使用來自不同URL的數據構建單個項目，應使用元屬性將其從一個請求對象傳遞到下一個請求對象。最後，你得到結果項目，以便將其寫入單行。

def parse_item(self, response): 
    for tr in response.xpath("/html/body/table[2]/tr/td/table/tr")[1:]: 
     [...] 
     second_selector = [...] 
     meta = {'item': item} 
     yield Request(second_selector, meta=meta, callback=self.parse_item_2) 

    def parse_item_2(self, response): 
     item = PropertiesItem(response.meta['item']) 
     item['Building_Condition']=response.xpath('/html/body/table[2]/tr/td/table/tr[2]/td[1]/text()').extract() 
     yield item

來源

2017-03-29 14:49:42 Fran

如何從同一個csv行的多個頁面中抓取數據？

回答

相關問題