Scrapy ::傾銷JSON文件

這裏的時候，問題與編碼是的網站，我想解析：web-site in russian Scrapy ::傾銷JSON文件

這裏是提取信息的代碼，我需要：

# -*- coding: utf-8 -*- 
from scrapy.spider import Spider 
from scrapy.selector import Selector 
from flats.items import FlatsItem 

class DmozSpider(Spider): 
name = "dmoz" 
start_urls = ['http://rieltor.ua/flats-sale/?ncrnd=6510'] 

def parse(self, response): 
    sel=Selector(response) 
    flats=sel.xpath('//*[@id="content"]') 
    flats_stored_info=[] 
    flat_item=FlatsItem() 
    for flat in flats: 
     flat_item['square']=[s.encode("utf-8") for s in sel.xpath('//div/strong[@class="param"][1]/text()').extract()] 
     flat_item['rooms_floor_floors']=[s.encode("utf-8") for s in sel.xpath('//div/strong[@class="param"][2]/text()').extract()] 
     flat_item['address']=[s.encode("utf-8") for s in flat.xpath('//*[@id="content"]//h2/a/text()').extract()] 
     flat_item['price']=[s.encode("utf-8") for s in flat.xpath('//div[@class="cost"]/strong/text()').extract()] 
     flat_item['subway']=[s.encode("utf-8") for s in flat.xpath('//span[@class="flag flag-location"]/a/text()').extract()] 
     flats_stored_info.append(flat_item) 
    return flats_stored_info

如何我轉儲到JSON文件

scrapy crawl dmoz -o items.json -t json

問題是，當我更換上面的代碼在控制檯打印，即這樣提取的信息：

flat_item['square']=sel.xpath('//div/strong[@class="param"][1]/text()').extract() 
    for bla in flat_item['square']: 
     print bla

該腳本正確顯示俄語信息。

但是，當我使用轉儲使用腳本的第一個版本的sraped信息（與編碼成UTF-8），將其寫入JSON文件是這樣的：

[{"square": ["2-\u043a\u043e\u043c\u043d., 16 \u044d\u0442\u0430\u0436 16-\u044d\u0442. \u0434\u043e\u043c", "1-\u043a\u043e\u043c\u043d.,

我怎樣才能將信息轉儲成俄文的json文件？感謝您的建議。

來源

2014-05-02 mr.M

你能證明你是如何傾倒JSON？ Spasibo。 – alecxe

請參閱編輯的帖子。 –

它被正確編碼，它只是json庫默認轉義非ascii字符。

您可以加載數據，並使用它（從你的例子複製數據）：

>>> import json 
>>> print json.loads('"2-\u043a\u043e\u043c\u043d., 16 \u044d\u0442\u0430\u0436 16-\u044d\u0442. \u0434\u043e\u043c"') 
2-комн., 16 этаж 16-эт. дом

來源

2014-05-04 19:16:19

Scrapy ::傾銷JSON文件

回答

相關問題