2016-04-01 64 views
0

我正在嘗試編寫一個從Sony PlayStation商店中檢索信息的Web應用程序。我找到了我想要的數據的JSON文件,但我想知道如何使用Scrapy來存儲JSON文件的某些元素?使用Scrapy刮取嵌套的JSON數據?

這裏的JSON數據的一部分:

{ 
    "age_limit":0, 
    "attributes":{ 
     "facets":{ 
      "platform":[ 
       {"name":"PS4™","count":96,"key":"ps4"}, 
       {"name":"PS3™","count":5,"key":"ps3"}, 
       {"name":"PS Vita","count":7,"key":"vita"}, 
      ] 
     } 
    } 
    } 

我只是想爲「名」 PS4「計數」值。我如何在Scrapy中獲得這個?這是我的Scrapy代碼到目前爲止:

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 
from crossbuy.items import PS4Vita 


class PS4VitaSpider(BaseSpider): 
    name = "ps4vita" # Name of the spider, to be used when crawling 
    allowed_domains = ["store.playstation.com"] # Where the spider is allowed to  go 
    start_url = "https://store.playstation.com/chihiro-api/viewfinder/US/en/999/STORE-MSF77008-9_PS4PSVCBBUNDLE?size=30&gkb=1&geoCountry=US" 

    def parse(self, response): 
     jsonresponse = json.loads(response) 

     pass # To be changed later 

謝謝!

+0

你就不能訪問{ 「名」:「PS4}?在正常方式如:'[P [ 「count」] for p in jsonresponse [「attributes」] [「facets」] [「platform」] if p [「name」] ==「PS4™」]'? – Anzel

回答

1
... 
def parse(self, response): 
    jsonresponse = json.loads(response.body) 
    my_count = None 
    for platform in jsonresponse['attributes']['facets']['platform']: 
     if 'PS4' in platform['name']: 
      my_count = platform['count'] 

    yield dict(count=my_count) 
... 
0

只需訪問JSON數據,你會Python字典:

# To get a list of the counts: 
counts = [x['count'] for x in jsonresponse['attributes']['facets']['platform']]