2016-09-20 88 views
2

我的數據是這樣的:轉換一個字典的熊貓數據幀

{u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'} 

我想將其轉換爲大熊貓數據幀。但是,當我嘗試

df = pd.DataFrame(response.items()) 

我與兩列的數據幀時,先用第一個關鍵,第二個與鍵的值:

      0      1 
0 "57e01311817bc367c030b390" {"ad_since": 2016, "indoor_swimming_pool": "No... 
1 "57e01311817bc367c030b3a8" {"ad_since": 2012, "indoor_swimming_pool": "No... 

我怎樣才能得到一個列對於每個鍵:"ad_since","indoor_swimming_pool","indoor_swimming_pool"?並保留第一列,或者將id作爲索引。

+0

嘗試read_json http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.read_json.html –

+0

您是否嘗試使用'pd.DataFrame(response.items())' ?對我來說,它不工作。 – jezrael

+0

@jezrael感謝您的評論,我編輯我的帖子 – mitsi

回答

1

您需要通過.apply(literal_eval).apply(json.loads)轉換的typestrdict然後用DataFrame.from_records

import pandas as pd 
from ast import literal_eval 

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
      u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'} 

df = pd.DataFrame.from_dict(response, orient='index') 

print (type(df.iloc[0,0])) 
<class 'str'> 

df.iloc[:,0] = df.iloc[:,0].apply(literal_eval) 

print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index)) 
          ad_since handicapped_access indoor_swimming_pool \ 
"57e01311817bc367c030b3a8"  2012    Yes     No 
"57e01311817bc367c030b390"  2016    Yes     No 

          seaside 
"57e01311817bc367c030b3a8"  No 
"57e01311817bc367c030b390"  No 

import pandas as pd 
import json 

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
      u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'} 


df = pd.DataFrame.from_dict(response, orient='index') 
df.iloc[:,0] = df.iloc[:,0].apply(json.loads) 


print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index)) 
          ad_since handicapped_access indoor_swimming_pool \ 
"57e01311817bc367c030b3a8"  2012    Yes     No 
"57e01311817bc367c030b390"  2016    Yes     No 

          seaside 
"57e01311817bc367c030b3a8"  No 
"57e01311817bc367c030b390"  No 
+0

第一種方法(使用'literal_eval')和整個數據集,我得到錯誤'ValueError:格式不正確的字符串'它可能是由於特殊字符。但它與'json.loads'的第二種方法完美結合,謝謝 – mitsi

+0

很高興能爲您提供幫助。 – jezrael

1

由於值是字符串,您可以使用json module和列表理解:

In [20]: d =  {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'} 

In [21]: import json 

In [22]: pd.DataFrame(dict([(k, [json.loads(e)[k] for e in d.values()]) for k in json.loads(d.values()[0])]), index=d.keys())Out[22]: 
          ad_since handicapped_access indoor_swimming_pool \ 
"57e01311817bc367c030b390"  2016    Yes     No 
"57e01311817bc367c030b3a8"  2012    Yes     No 

         seaside 
"57e01311817bc367c030b390"  No 
"57e01311817bc367c030b3a8"  No