2015-10-15 22 views
0

我想抓取一個網站通過帖子不同的頁碼,但我只得到第一頁的數據,然後蜘蛛完成,我想也許抓取相同的url,它通過scrappy過濾。
這裏是我的代碼:如何使用scrapy通過發佈不同的數據來抓取相同的網址?

class ZhejiangCrawl(Spider): 
    name = 'ZhejiangCrawl' 
    root_url= 'http://www.zjsfgkw.cn/Execute/CreditCompany' 
    start_page = 1 
    current_page = start_page 
    end_page = 24974 
    post_data = {'PageNo': str(current_page), 'PageSize': '5', 'ReallyName': '', 'CredentialsNumber': '', 'AH': '', 
         'ZXFY': '', 'StartLARQ': '','EndLARQ':''} 
    headers = HEADER 
    cookies = COOKIES 

    def start_requests(self): 
     return [FormRequest(self.root_url, headers=self.headers, cookies=self.cookies, formdata=self.post_data, dont_filter=True, 
         callback=self.parse)] 

    def parse(self, response): 
     if self.current_page < self.end_page: 
      self.current_page += 1 
      self.post_data['PageNo'] = str(self.current_page) 
      yield [FormRequest(self.root_url, headers=self.headers, cookies=self.cookies, dont_filter=True, 
          formdata=self.post_data, callback=self.parse)] 

     jsonstr = json.loads(response.body) 
     for item_dict in jsonstr['informationmodels']: 
      item = ZhejiangcrawlItem() 
      item['name'] = item_dict['ReallyName'] 
      item['cardNum'] = item_dict['CredentialsNumber'] 
      item['performance'] = item_dict['ZXJE'] 
      item['unperformance'] = item_dict['WZXJE'] 
      item['gistUnit'] = item_dict['ZXFY'] 
      item['address'] = item_dict['Address'] 
      item['gistId'] = item_dict['ZXYJ'] 
      item['caseCode'] = item_dict['AH'] 
      item['regDate'] = item_dict['LARQ'] 
      item['exposureDate'] = item_dict['BGRQ'] 
      item['gistReason'] = item_dict['ZXAY'] 
      yield item 

如何解決呢?

回答

0

如果您認爲它是因爲DupeFilter而被過濾的,請將dont_filter=True添加到您的FormRequests

另外需要注意的是,沒有理由將您的收錄/退回內容列出。