2012-11-12 136 views
0

我目前正試圖通過包含許多Facebook聊天片段的文本文件進行解析。片段存儲如下: -使用Python 2.x解析JSON文件

{"t":"msg","c":"p_100002239013747","s":14,"ms":[{"msg":{"text":"2what is the best restauran 
t in hong kong? ","time":1303115825598,"clientTime":1303115824391,"msgID":"1862585188"},"from":10000 
2239013747,"to":635527479,"from_name":"David Robinson","from_first_name":"David","from_gender":1,"to_name":"Jason Yeung","to_first_name":"Jason","to_gender":2,"type":"msg"}]} 

我試過很多方法來解析/打開JSON文件,但無濟於事。以下是我已經試過thusfar: -

import json 

data = [] 
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_string: 
    for line in json_string: 
     data.append(json.loads(line)) 

錯誤:

Traceback (most recent call last): 
    File "C:/Users/Amy/Desktop/facebookparser.py", line 6, in <module> 
    data.append(json.loads(line)) 
    File "C:\Program Files\Python27\lib\json\__init__.py", line 326, in loads 
    return _default_decoder.decode(s) 
    File "C:\Program Files\Python27\lib\json\decoder.py", line 366, in decode 
    obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
    File "C:\Program Files\Python27\lib\json\decoder.py", line 382, in raw_decode 
    obj, end = self.scan_once(s, idx) 
ValueError: Invalid control character at: line 1 column 91 (char 91) 

也:

import json 

with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_file: 
    data = json.load(json_file) 

...但我得到完全相同的錯誤如上。

有什麼建議嗎?我在這裏搜索了以前的帖子,並嘗試了其他解決方案,但無濟於事。我知道我需要把它當作一個字典文件,例如,'時間'是一個關鍵字,'1303115825598'是各自的時間值,但如果我甚至無法將json文件處理到內存中,我就沒有辦法可以解析它。

我哪裏錯了?謝謝

回答

3

您的數據包含換行符,JSON不允許這些換行符。你必須到線再縫合在一起回:

data = [] 
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_string: 
    partial = '' 
    for line in json_string: 
     partial += line.rstrip('\n') 
     try: 
      data.append(json.loads(partial)) 
      partial = '' 
     except ValueError: 
      continue # Not yet a complete JSON value 

的代碼行收集到partial,但減去換行,並嘗試將JSON解碼。如果成功,則partial將再次設置爲空字符串以處理下一個條目。如果失敗,我們循環到下一行進行追加,直到一個完整的JSON值進行解碼。

+0

Thanks Martijn。我從互聯網上覆制了JSON摘錄,因此格式錯誤。我已經將文件製作成一個連續的字符串,現在可以正確讀取它。再次感謝 – thefragileomen