變灰的多個下拉窗體所以我試圖從Gasbuddy.com取消一些汽車信息,但我在scrapy代碼中遇到了一些麻煩。你如何處理與scrapy FormRequest
這裏是我到目前爲止,讓我知道我做錯了什麼:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.loader import XPathItemLoader
from scrapy.http import Request
from scrapy.http import FormRequest
class gasBuddy(BaseSpider):
name = "gasBuddy"
allowed_domains = ["http://www.gasbuddy.com"]
start_urls = [
"http://www.gasbuddy.com/Trip_Calculator.aspx",
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
#for years in hxs.select('//select[@id="ddlYear"]/option/text()'):
#print years
FormRequest(url="http://www.gasbuddy.com/Trip_Calculator.aspx",
formdata={'Year': '%s'%("2011")},
callback=self.make('2011'))
def make (years, self, response):
#this is where we loop through all of the car makes and send the response to modle
hxs = HtmlXPathSelector(response)
for makes in hxs.select('//select[@id="ddlMake"]/option/text()').extract()
FormRequest(url="http://www.gasbuddy.com/Trip_Calculator.aspx",
formdata={'Year': '%s', 'Make': '%s'%(years, makes)},
callback=self.model(years, makes))
def model (years, makes, self, response):
#this is where we loop through all of the car modles and get all of the data assoceated with it.
hxs = HtmlXPathSelector(response)
for models in hxs.select('//select[@id="ddlModel"]/option/text()')
FormRequest(url="http://www.gasbuddy.com/Trip_Calculator.aspx",
formdata={'Year': '%s', 'Make': '%s', 'Model': '%s'%(years, makes, models)},
callback=self.model(years, makes))
print hxs.select('//td[@id="tdCityMpg"]/text()')
我這個代碼的基本思路是選擇一個表單字段然後再打一個formRequest並有回調到另一個功能,然後繼續循環,直到我到達最後一個,然後我開始閱讀每輛車的信息。但我不斷收到一些錯誤......一個是 gasbuddy沒有屬性「編碼」(我不知道這是什麼)。 我也不確定您是否可以將周界傳遞給回調函數。
任何幫助將不勝感激。
非常感謝你對這種深入的解釋。這非常有幫助。你是我的英雄。 :) –
@ AlexW.H.B。試圖做類似的事情,你能分享你如何解決這個問題嗎?新的python,但如果你分享你的代碼,我可能會明白,因爲有PHP背景。 –