2017-03-06 60 views
1

這裏我想存儲網站頁面上給出的列表中的數據。如果我正在運行命令無法存儲以json或csv格式的scrapy報廢的數據

response.css('title::text').extract_first()  and 
response.css("article div#section-2 li::text").extract() 

單獨在scrapy shell中顯示期望的shell輸出。 下面是我的代碼,這是不以JSON或CSV格式存儲數據:

import scrapy 

class QuotesSpider(scrapy.Spider): 
    name = "medical" 

    start_urls = ['https://medlineplus.gov/ency/article/000178.html/'] 


    def parse(self, response): 
     yield 
     { 
      'topic': response.css('title::text').extract_first(), 
      'symptoms': response.css("article div#section-2 li::text").extract() 
     } 

我試圖運行使用

scrapy crawl medical -o medical.json 

回答

1

你需要修復您的網址這個代碼,它是https://medlineplus.gov/ency/article/000178.htm,而不是https://medlineplus.gov/ency/article/000178.html/

而且,更重要的是,你需要定義一個類Item和產量/從parse()回調的蜘蛛返回它:

import scrapy 


class MyItem(scrapy.Item): 
    topic = scrapy.Field() 
    symptoms = scrapy.Field() 


class QuotesSpider(scrapy.Spider): 
    name = "medical" 

    allowed_domains = ['medlineplus.gov'] 
    start_urls = ['https://medlineplus.gov/ency/article/000178.htm'] 

    def parse(self, response): 
     item = MyItem() 

     item["topic"] = response.css('title::text').extract_first() 
     item["symptoms"] = response.css("article div#section-2 li::text").extract() 

     yield item