1
我想運行一個針對craigslist的蜘蛛並使用scrapy將結果保存到json文件中。我的蜘蛛在控制檯顯示結果,但我的.json文件是空的。我使用的命令是:Scrapy打印到json文件
scrapy runspider detroit.py -o detroit.json
有人能棚一盞小燈,謝謝!
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from craigslist_sample.items import CraigslistSampleItem
class MySpider(BaseSpider):
name = "craig"
allowed_domains = ["craigslist.org"]
start_urls = ["http://detroit.craigslist.org/search/sof"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//span[@class='pl']")
for titles in titles:
title = titles.select("a/text()").extract()[0]
link = titles.select("a/@href").extract()[0]
print title, link
謝謝。就是這樣! Scrapy非常酷 – jpavlov