Scrapy解析JSON輸出

我正在使用scrapy來抓取網站。有些頁面使用AJAX，所以我得到了AJAX請求來獲取實際的數據。到現在爲止還挺好。這些AJAX請求的輸出是JSON輸出。現在我想解析JSON，但只是提供HtmlXPathSelector。有沒有人成功地將json輸出轉換爲html並能夠使用HtmlXPathSelector解析它？Scrapy解析JSON輸出

非常感謝你提前

來源

2013-04-09 pep

你不想將JSON轉換爲HTML。你能給我們一個JSON響應的例子嗎？ – 2013-04-09 22:06:50

import json 

response = json.loads(jsonResponse)

上面的代碼將解碼收到的JSON。之後，您應該能夠以任何您想要的方式處理它。

來源

2013-04-09 19:20:21

稍微複雜，仍然工作（與您從Ajax請求得到JSON更換jsonResponse）。

如果你有興趣與JSON輸出的XPath工作..

免責聲明：可能不是最優的SOLN。 +1如果有人改進了這種方法。

安裝dicttoxml包（PIP推薦）

- 下載使用scrapy傳統請求模塊

在蜘蛛輸出：

from scrapy.selector import XmlXPathSelector 
import lxml.etree as etree 

request = Request(link, callback=self.parse_resp) 
yield request 

def parse_resp(self,response): 
    json=response.body 
    #Now load the contents using python's JSON module 
    json_dict = json.loads(json) 
    #transform the contents into xml using dicttoxml 
    xml = dicttoxml.dicttoxml(json_dict) 
    xml = etree.fromstring(xml) 
    #Apply scrapy's XmlXPathSelector module,and start using xpaths 
    xml = XmlXPathSelector(text=xml) 
    data = xml.select(".//*[@id='count']/text()").extract() 
    return data

我這樣做是因爲，我維護所有的XPath所有的蜘蛛在一個地方（配置文件）

來源

2014-10-31 07:26:57 Sravan

Scrapy解析JSON輸出

回答

相關問題