2014-01-18 166 views
0

我想通過使用scrapy編寫m-ati.su的解析器。在第一步中,我必須從不同的城市獲取名稱分別爲「From」和「To」的組合框的值和文本框。我看着請求在螢火蟲,並寫道如何從ajax中獲取組合框的值和文本框?

class spider(BaseSpider): 
    name = 'ati_su' 
    start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load'] 
    allowed_domains = ["m-ati.su"] 

    def parse(self, response): 
     yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
         callback=self.ati_from, 
         formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'}) 
    def ati_from(self, response): 
     json = response.body 
     open('results.txt', 'wb').write(json) 

而且我有這個請求「500內部服務器錯誤」。我做錯了什麼?對不起英語不好。 感謝

回答

0

我認爲你可能有一個X-Requested-With: XMLHttpRequest頭添加到您的POST請求,所以你可以試試這個:

def parse(self, response): 
     yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
          callback=self.ati_from, 
          formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'}, 
          headers={"X-Requested-With": "XMLHttpRequest"}) 

編輯:我試圖運行的蜘蛛,並用此來:

(請求正文是JSON編碼,當我檢查它與Firefox,所以我用Request並強制「POST」方法,我得到的響應是在「windows-1251」中終止)

from scrapy.spider import BaseSpider 
from scrapy.http import Request 
import json 

class spider(BaseSpider): 
    name = 'ati_su' 
    start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load'] 
    allowed_domains = ["m-ati.su"] 

    def parse(self, response): 
     yield Request('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
         callback=self.ati_from, 
         method="POST", 
         body=json.dumps({ 
          'prefixText': 'moscow', 
          'count': '10', 
          'contextKey':'All_0$Rus' 
         }), 
         headers={ 
          "X-Requested-With": "XMLHttpRequest", 
          "Accept": "application/json, text/javascript, */*; q=0.01", 
          "Content-Type": "application/json; charset=utf-8", 
          "Pragma": "no-cache", 
          "Cache-Control": "no-cache", 
         }) 
    def ati_from(self, response): 
     jsondata = response.body 
     print json.loads(jsondata, encoding="windows-1251") 
+0

這樣的FormReqest [doc.scrapy](http://doc.scrapy.org/zh/latest/topics/request-response.html#formrequest-objects)沒有標頭參數。 – yavalvas

+0

「FormRequest類擴展了基本請求」,因此您可以使用「headers」參數。你試過了嗎? –

+0

啊,先生。我試過了,再次出現錯誤。 – yavalvas