Scrapy出口無效JSON

我解析如下：Scrapy出口無效JSON

def parse(self, response): 
    hxs = HtmlXPathSelector(response) 
    titles = hxs.select("//tr/td") 
    items = [] 
    for titles in titles: 
     item = MyItem() 
     item['title'] = titles.select('h3/a/text()').extract() 
     items.append(item) 
    return items

爲什麼它輸出JSON這樣的：

[{"title": ["random title #1"]}, 
{"title": ["random title #2"]}]

來源

2013-08-24 deekay

這是有效的JSON。你從哪裏得到這個輸出？張貼所有的刮板輸出。 – Blender

我通過cmdline：scrapy抓取myspider -o items.json -t json - 我想我不明白[]來自哪裏。應該是一個純文本項目。 – deekay

@agf：Scrapy將列表和生成器解包爲單個項目。 – Blender

titles.select('h3/a/text()').extract()返回一個列表，所以你得到一個列表。 Scrapy不會對你的物品結構做任何假設。

速戰速決是隻得到第一個結果：

item['title'] = titles.select('h3/a/text()').extract()[0]

一個更好的解決辦法是使用的物品裝載和使用TakeFirst()爲輸出處理器：

from scrapy.contrib.loader import XPathItemLoader 
from scrapy.contrib.loader.processor import TakeFirst, MapCompose 

class YourItemLoader(XPathItemLoader): 
    default_item_class = YourItemClass 

    default_input_processor = MapCompose(unicode.strip) 
    default_output_processor = TakeFirst() 

    # title_in = MapCompose(unicode.strip)

和負載該項目的方式：

def parse(self, response): 
    hxs = HtmlXPathSelector(response) 

    for title in hxs.select("//tr/td"): 
     loader = YourItemLoader(selector=title, response=response) 
     loader.add_xpath('title', 'h3/a/text()') 

     yield loader.load_item()

來源

2013-08-24 06:44:01 Blender

工作！謝謝！ – deekay

作爲一個替代簡單的答案，你可以寫一個這樣的輔助函數：

def extractor(xpathselector, selector): 
    """ 
    Helper function that extract info from xpathselector object 
    using the selector constrains. 
    """ 
    val = xpathselector.select(selector).extract() 
    return val[0] if val else None

，並調用它像這樣：

item['title'] = extractor(titles, 'h3/a/text()')

來源

2013-09-26 00:06:28 Medeiros

Scrapy出口無效JSON

回答

相關問題