2014-03-31 162 views
7

我試圖讀取通過df.to_json()通過pd.read_json創建的數據幀,但我得到一個ValueError。我認爲這可能與索引是MultiIndex這一事實有關,但我不知道如何處理該索引。熊貓閱讀json不工作MultiIndex

的55K行的原始數據幀被稱爲psi,我通過創建test.json

psi.head().to_json('test.json') 

Hereprint psi.head().to_string()輸出,如果你想使用它。

當我在這一小組數據(5行)上做的時候,我得到一個ValueError

! wget --no-check-certificate https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json 
import pandas as pd 
pd.read_json('test.json') 

下面是完整的堆棧:

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-14-1de2f0e65268> in <module>() 
     1 get_ipython().system(u' wget https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json'>) 
     2 import pandas as pd 
----> 3 pd.read_json('test.json') 

/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit) 
    196   obj = FrameParser(json, orient, dtype, convert_axes, convert_dates, 
    197       keep_default_dates, numpy, precise_float, 
--> 198       date_unit).parse() 
    199 
    200  if typ == 'series' or obj is None: 

/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self) 
    264 
    265   else: 
--> 266    self._parse_no_numpy() 
    267 
    268   if self.obj is None: 

/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self) 
    481   if orient == "columns": 
    482    self.obj = DataFrame(
--> 483     loads(json, precise_float=self.precise_float), dtype=None) 
    484   elif orient == "split": 
    485    decoded = dict((str(k), v) 

ValueError: No ':' found when decoding object value 

> /home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.py(483)_parse_no_numpy() 
    482    self.obj = DataFrame(
--> 483     loads(json, precise_float=self.precise_float), dtype=None) 
    484   elif orient == "split": 

但是,當我這樣做對整個數據框(55K行)然後我得到一個invalid pointer error和IPython的內核死亡。有任何想法嗎?

編輯:添加了如何在首先生成json。

+0

這不是有效的JSON。我可以想象這個問題與它是如何創建的。你有創建它的示例代碼? – BrenBarn

+1

未實現,請參見:https://github.com/pydata/pandas/issues/4889 – Jeff

+0

@Jeff:如果'to_json'生成無效的JSON,它似乎仍然很糟糕。那是怎麼回事,還是有其他的錯誤嗎? – BrenBarn

回答