2015-10-15 89 views
10

我有以下形式的字典詞典:從詞典的詞典創建大熊貓據幀

{'user':{movie:rating} } 

例如,

{Jill': {'Avenger: Age of Ultron': 7.0, 
          'Django Unchained': 6.5, 
          'Gone Girl': 9.0, 
          'Kill the Messenger': 8.0} 
'Toby': {'Avenger: Age of Ultron': 8.5, 
           'Django Unchained': 9.0, 
           'Zoolander': 2.0}} 

我想http://stardict.sourceforge.net/Dictionaries.php下載這個字典轉換成大熊貓數據幀與第1列用戶名和其他列電影評級即

user Gone_Girl Horrible_Bosses_2 Django_Unchained Zoolander etc. \ 

Howev呃,有些用戶沒有對電影進行評分,所以這些電影不包含在該用戶鍵()的值()中。在這些情況下,只需填寫NaN即可。

截至目前,我遍歷鍵,填寫清單,然後使用這個列表創建一個數據幀:

data=[] 
for i,key in enumerate(movie_user_preferences.keys()): 
    try:    
     data.append((key 
        ,movie_user_preferences[key]['Gone Girl'] 
        ,movie_user_preferences[key]['Horrible Bosses 2'] 
        ,movie_user_preferences[key]['Django Unchained'] 
        ,movie_user_preferences[key]['Zoolander'] 
        ,movie_user_preferences[key]['Avenger: Age of Ultron'] 
        ,movie_user_preferences[key]['Kill the Messenger'])) 
    # if no entry, skip 
    except: 
     pass 
df=pd.DataFrame(data=data,columns=['user','Gone_Girl','Horrible_Bosses_2','Django_Unchained','Zoolander','Avenger_Age_of_Ultron','Kill_the_Messenger']) 

但這只是給了我誰額定所有的電影中的用戶的數據幀集合。

我的目標是通過遍歷電影標籤(而不是上面顯示的蠻力方法),並追加到數據列表,其次,創建一個數據幀包括所有用戶,並在做的元素放在空值沒有電影評級。

回答

17

您可以字典的字典傳遞到數據幀的構造函數:

In [11]: d = {'Jill': {'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0, 'Avenger: Age of Ultron': 7.0}, 'Toby': {'Django Unchained': 9.0, 'Zoolander': 2.0, 'Avenger: Age of Ultron': 8.5}} 

In [12]: pd.DataFrame(d) 
Out[12]: 
         Jill Toby 
Avenger: Age of Ultron 7.0 8.5 
Django Unchained   6.5 9.0 
Gone Girl    9.0 NaN 
Kill the Messenger  8.0 NaN 
Zoolander    NaN 2.0 

或者使用from_dict方法:

In [13]: pd.DataFrame.from_dict(d) 
Out[13]: 
         Jill Toby 
Avenger: Age of Ultron 7.0 8.5 
Django Unchained   6.5 9.0 
Gone Girl    9.0 NaN 
Kill the Messenger  8.0 NaN 
Zoolander    NaN 2.0 

In [14]: pd.DataFrame.from_dict(d, orient='index') 
Out[14]: 
     Django Unchained Gone Girl Kill the Messenger Avenger: Age of Ultron Zoolander 
Jill    6.5   9     8      7.0  NaN 
Toby    9.0  NaN     NaN      8.5   2 
+0

有沒有辦法讓用戶命名一個單獨的列而不是索引? – Feynman27

+3

pd.DataFrame.from_dict(d,orient ='index')。reset_index() –

+0

很好。謝謝! – Feynman27

0

這蠻力方法也似乎工作,但遍歷在我看來,電影標籤仍然會更加健壯。

data=[] 
for i,key in enumerate(movie_user_preferences.keys()): 
    try:    
     data.append((key 
        ,movie_user_preferences[key]['Gone Girl'] if 'Gone Girl' in movie_user_preferences[key] else 'NaN' 
        ,movie_user_preferences[key]['Horrible Bosses 2'] if 'Horrible Bosses 2' in movie_user_preferences[key] else 'NaN' 
        ,movie_user_preferences[key]['Django Unchained'] if 'Django Unchained' in movie_user_preferences[key] else 'NaN' 
        ,movie_user_preferences[key]['Zoolander'] if 'Zoolander' in movie_user_preferences[key] else 'NaN' 
        ,movie_user_preferences[key]['Avenger: Age of Ultron'] if 'Avenger: Age of Ultron' in movie_user_preferences[key] else 'NaN' 
        ,movie_user_preferences[key]['Kill the Messenger'] if 'Kill the Messenger' in movie_user_preferences[key] else 'NaN')) 

    # if no entry, skip 
    except: 
     pass 


user Gone_Girl Horrible_Bosses_2 Django_Unchained Zoolander \ 
0  Sam   6     3    7.5   7 
1  Max  10     6    7.0  10 
2 Robert  NaN     5    7.0   9 
3  Toby  NaN    NaN    9.0   2 
4 Julia  6.5    NaN    6.0  6.5 
5 William   7     4    8.0   4 
6  Jill   9    NaN    6.5  NaN 

Avenger_Age_of_Ultron Kill_the_Messenger 
0     10.0    5.5 
1     7.0     5 
2     8.0     9 
3     8.5    NaN 
4     10.0     6 
5     6.0    6.5 
6     7.0     8