2016-02-05 138 views
1

我想運行一個針對craigslist的蜘蛛並使用scrapy將結果保存到json文件中。我的蜘蛛在控制檯顯示結果,但我的.json文件是空的。我使用的命令是:Scrapy打印到json文件

scrapy runspider detroit.py -o detroit.json

有人能棚一盞小燈,謝謝!

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 
from craigslist_sample.items import CraigslistSampleItem 

class MySpider(BaseSpider): 
     name = "craig" 
     allowed_domains = ["craigslist.org"] 
     start_urls = ["http://detroit.craigslist.org/search/sof"] 


     def parse(self, response): 
       hxs = HtmlXPathSelector(response) 
       titles = hxs.select("//span[@class='pl']") 
       for titles in titles: 
         title = titles.select("a/text()").extract()[0] 
         link = titles.select("a/@href").extract()[0] 
         print title, link 

回答

1

那是因爲你只是打印結果。您需要實例化項目並返回它們:

def parse(self, response): 
    for elm in response.xpath("//span[@class='pl']//a"): 
     item = CraigslistSampleItem() 
     item["title"] = elm.xpath("text()").extract_first() 
     item["link"] = elm.select("href").extract_first() 
     yield item 
+0

謝謝。就是這樣! Scrapy非常酷 – jpavlov