使用Scrapy刮取嵌套的JSON數據？

我正在嘗試編寫一個從Sony PlayStation商店中檢索信息的Web應用程序。我找到了我想要的數據的JSON文件，但我想知道如何使用Scrapy來存儲JSON文件的某些元素？使用Scrapy刮取嵌套的JSON數據？

這裏的JSON數據的一部分：

{ 
    "age_limit":0, 
    "attributes":{ 
     "facets":{ 
      "platform":[ 
       {"name":"PS4™","count":96,"key":"ps4"}, 
       {"name":"PS3™","count":5,"key":"ps3"}, 
       {"name":"PS Vita","count":7,"key":"vita"}, 
      ] 
     } 
    } 
    }

我只是想爲「名」 PS4「計數」值。我如何在Scrapy中獲得這個？這是我的Scrapy代碼到目前爲止：

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 
from crossbuy.items import PS4Vita 


class PS4VitaSpider(BaseSpider): 
    name = "ps4vita" # Name of the spider, to be used when crawling 
    allowed_domains = ["store.playstation.com"] # Where the spider is allowed to  go 
    start_url = "https://store.playstation.com/chihiro-api/viewfinder/US/en/999/STORE-MSF77008-9_PS4PSVCBBUNDLE?size=30&gkb=1&geoCountry=US" 

    def parse(self, response): 
     jsonresponse = json.loads(response) 

     pass # To be changed later

謝謝！

來源

2016-04-01 user3183717

你就不能訪問{ 「名」：「PS4}？在正常方式如：'[P [ 「count」] for p in jsonresponse [「attributes」] [「facets」] [「platform」] if p [「name」] ==「PS4™」]'？ – Anzel

... 
def parse(self, response): 
    jsonresponse = json.loads(response.body) 
    my_count = None 
    for platform in jsonresponse['attributes']['facets']['platform']: 
     if 'PS4' in platform['name']: 
      my_count = platform['count'] 

    yield dict(count=my_count) 
...

來源

2016-04-01 22:25:05 eLRuLL

只需訪問JSON數據，你會Python字典：

# To get a list of the counts: 
counts = [x['count'] for x in jsonresponse['attributes']['facets']['platform']]

來源

2016-04-01 22:30:11

使用Scrapy刮取嵌套的JSON數據？

回答

相關問題