scrapy蜘蛛沒有返回任何結果

這是我第一次嘗試創建一隻蜘蛛，如果我沒有正確完成，請不要吝惜我。這裏是我試圖從中提取數據的網站的鏈接。 http://www.4icu.org/in/。我想要顯示在頁面上的大學的整個列表。但是當我運行下面的蜘蛛時，我返回一個空的json文件。我items.pyscrapy蜘蛛沒有返回任何結果

import scrapy 
    class CollegesItem(scrapy.Item): 
    # define the fields for your item here like: 
     link = scrapy.Field()

這是蜘蛛 colleges.py

import scrapy 
    from scrapy.spider import Spider 
    from scrapy.http import Request 

    class CollegesItem(scrapy.Item): 
    # define the fields for your item here like: 
     link = scrapy.Field() 

    class CollegesSpider(Spider): 
     name = 'colleges' 
     allowed_domains = ["4icu.org"] 
     start_urls = ('http://www.4icu.org/in/',) 

     def parse(self, response): 
      return Request(
       url = "http://www.4icu.org/in/", 
       callback = self.parse_fixtures 
      ) 
     def parse_fixtures(self,response): 
      sel = response.selector 
      for div in sel.css("col span_2_of_2>div>tbody>tr"): 
       item = Fixture() 
       item['university.name'] = tr.xpath('td[@class="i"]/span /a/text()').extract() 
       yield item

來源

2015-08-17 Maitreyee Tewari

哇，你必須先看看您的代碼中存在一些問題。而且因爲在運行蜘蛛時你沒有得到任何異常，所以你可以放心，你永遠不會到達'parse_fixtures'方法或至少'for'循環。 – GHajba

至於在這個問題有一些問題與您的代碼的註釋說明。

首先，您不需要兩種方法 - 因爲在parse方法中，您調用與在start_urls中所做的相同的URL。

您可以通過網站的一些信息，請嘗試使用下面的代碼：

def parse(self, response): 
    for tr in response.xpath('//div[@class="section group"][5]/div[@class="col span_2_of_2"][1]/table//tr'): 
     if tr.xpath(".//td[@class='i']"): 
      name = tr.xpath('./td[1]/a/text()').extract()[0] 
      location = tr.xpath('./td[2]//text()').extract()[0] 
      print name, location

並調整到您需要填寫您的項目（或項目）。

正如你所看到的，您的瀏覽器顯示在table當你Scrapy刮中不存在額外的tbody。這意味着您經常需要判斷您在瀏覽器中看到的內容。

來源

2015-08-17 13:09:55 GHajba

感謝您的指導，它獲取數據。以下是修改後的代碼和結果。 –

這裏是運行命令蜘蛛

>>scrapy crawl colleges -o mait.json

繼後的工作代碼

import scrapy 
    from scrapy.spider import Spider 
    from scrapy.http import Request 

    class CollegesItem(scrapy.Item): 
    # define the fields for your item here like: 
     name = scrapy.Field() 
     location = scrapy.Field() 
    class CollegesSpider(Spider): 
     name = 'colleges' 
     allowed_domains = ["4icu.org"] 
     start_urls = ('http://www.4icu.org/in/',) 

     def parse(self, response): 
      for tr in response.xpath('//div[@class="section group"] [5]/div[@class="col span_2_of_2"][1]/table//tr'): 
       if tr.xpath(".//td[@class='i']"): 
        item = CollegesItem() 
        item['name'] = tr.xpath('./td[1]/a/text()').extract()[0] 
        item['location'] = tr.xpath('./td[2]//text()').extract()[0] 
        yield item

是結果的片段：

[[[[[[[{"name": "Indian Institute of Technology Bombay", "location": "Mumbai"}, 
    {"name": "Indian Institute of Technology Madras", "location": "Chennai"}, 
    {"name": "University of Delhi", "location": "Delhi"}, 
    {"name": "Indian Institute of Technology Kanpur", "location": "Kanpur"}, 
    {"name": "Anna University", "location": "Chennai"}, 
    {"name": "Indian Institute of Technology Delhi", "location": "New Delhi"}, 
    {"name": "Manipal University", "location": "Manipal ..."}, 
    {"name": "Indian Institute of Technology Kharagpur", "location": "Kharagpur"}, 
    {"name": "Indian Institute of Science", "location": "Bangalore"}, 
    {"name": "Panjab University", "location": "Chandigarh"}, 
    {"name": "National Institute of Technology, Tiruchirappalli", "location": "Tiruchirappalli"}, .........

來源

2015-08-17 13:53:28

scrapy蜘蛛沒有返回任何結果

回答

相關問題