2014-09-06 113 views
0

我有一些Scrapy代碼,使用正則表達式來搜索網站以查找包含我的數據的字典形式的一些非標準源代碼尋找。當發現這個數據被打印到屏幕上。exceptions.ValueError:期望的屬性名稱:第1行第3列(char 2)

包含用戶看到的此數據的表具有多個選項卡。當用戶在標籤之間移動時,XHR請求刷新後臺數據。代碼的第二部分試圖打印字典返回時,從「整體」​​到「首頁」標籤下頁的用戶移動:

http://www.whoscored.com/Teams/32/

的代碼是在這裏:

from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
from scrapy.selector import Selector 
from scrapy.item import Item 
from scrapy.spider import BaseSpider 
from scrapy import log 
from scrapy.cmdline import execute 
from scrapy.utils.markup import remove_tags 
import time 
import re 
import json 
import requests 


class ExampleSpider(CrawlSpider): 
    name = "goal2" 
    allowed_domains = ["whoscored.com"] 
    start_urls = ["http://www.whoscored.com"] 
    download_delay = 5 

    rules = [Rule(SgmlLinkExtractor(allow=('\Teams'),deny=(),), follow=False, callback='parse_item')] 

    def parse_item(self, response): 

     match1 = re.search(re.escape("DataStore.prime('stage-player-stat', defaultTeamPlayerStatsConfigParams.defaultParams , ") \ 
        + '(\[.*\])' + re.escape(");"), response.body) #regex to match inital data item 


     if match1 is not None: 
      playerdata1 = match1.group(1) #if match1 isnt empty then print the dictionary embedded in the source code of the page 

      print '**********Players by team (Summary - Overall):**********' 
      print '-' * 170 
      for player in json.loads(playerdata1): 

       print ("{TeamId},{PlayerId},{Name}".decode().format(**player)) 


      #submit xhr request to obtain the dictionary that contains the 'Home' data, rather than the 'Overall' data embedded in the source code. 
      url = 'http://www.whoscored.com/stageplayerstatfeed' 
      params = { 
      'field': '1', 
      'isAscending': 'false', 
      'orderBy': 'Rating', 
      'playerId': '-1', 
      'stageId': '9155', 
      'teamId': '32' 
      } 
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36', 
      'X-Requested-With': 'XMLHttpRequest', 
      'Host': 'www.whoscored.com', 
      'Referer': 'http://www.whoscored.com/Teams/32/'} 

      response = requests.get(url, params=params, headers=headers) 

      fixtures = response.json() 
      print '**********Players by team (Summary - Home):**********' 
      print '-' * 170 

      for player in json.loads(fixtures): #print 'Home' dictionary here: 

       print ("{TeamId},{PlayerId},{Name}".decode().format(**player)) 

execute(['scrapy','crawl','goal2']) 

此代碼拋出一個錯誤,指出應該使用字符串或緩衝區。當我試圖轉換變量「燈具」的字符串中的語句for player in json.loads(fixtures):在使用之前,我得到一個錯誤說:

File "C:\Python27\lib\json\__init__.py", line 338, in loads 
    return _default_decoder.decode(s) 
    File "C:\Python27\lib\json\decoder.py", line 366, in decode 
    obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
    File "C:\Python27\lib\json\decoder.py", line 382, in raw_decode 
    obj, end = self.scan_once(s, idx) 
exceptions.ValueError: Expecting property name: line 1 column 3 (char 2) 

我假設的錯誤是相對於聲明.decode().format(**player)),但我我不確定這需要改變。

任何人都可以幫忙嗎?

感謝

+0

'fixtures'是一個Python對象了。爲什麼你將元素傳遞給'json.loads()'**再次**? – 2014-09-06 13:32:32

回答

1

您正在試圖解碼那些對象已解碼。這就是response.json()已經處理了。

只是環比fixtures列表沒有它們傳遞給json.loads()

for player in fixtures: 

您可以刪除.decode()方法和使用u'...' unicode字符串字面來代替:

print u"{TeamId},{PlayerId},{Name}".format(**player) 

在Python 2 ,print是一個聲明,而不是函數,除非您在模塊的頂部使用from __future__ import print_function

爲您的樣品URL,標題和參數,這將產生:

>>> fixtures = response.json() 
>>> for player in fixtures: 
...  print u"{TeamId},{PlayerId},{Name}".format(**player) 
... 
32,81726,Phil Jones 
32,137795,Tyler Blackett 
32,8166,Ashley Young 
32,18296,Antonio Valencia 
32,22079,Jonny Evans 
32,23110,Ángel Di María 
32,25363,Juan Mata 
32,71345,Chris Smalling 
32,5835,Darren Fletcher 
32,107941,Michael Keane 
32,79554,David de Gea 
32,69956,Tom Cleverley 
32,3859,Wayne Rooney 
32,21723,Anderson 
32,4564,Robin van Persie 
32,39308,Danny Welbeck 
32,130334,Adnan Januzaj 
相關問題