2015-10-16 75 views
3

import pandas as pd 
import numpy as np 
import random 

labels = ["c1","c2","c3"] 
c1 = ["one","one","one","two","two","three","three","three","three"] 
c2 = [random.random() for i in range(len(c1))] 
c3 = ["alpha","beta","gamma","alpha","gamma","alpha","beta","gamma","zeta"] 
DF = pd.DataFrame(np.array([c1,c2,c3])).T 
DF.columns = labels 

數據框的樣子:熊貓:最有效的方法,使詞典的詞典從數據幀列

 c1    c2  c3 
0 one 0.440958516531 alpha 
1 one 0.476439953723 beta 
2 one 0.254235673552 gamma 
3 two 0.882724336464 alpha 
4 two 0.79817899139 gamma 
5 three 0.677464637887 alpha 
6 three 0.292927670096 beta 
7 three 0.0971956881825 gamma 
8 three 0.993934915508 zeta 

我能想到做字典的唯一辦法是:

D_greek_value = {} 

for greek in set(DF["c3"]): 
    D_c1_c2 = {} 
    for i in range(DF.shape[0]): 
     row = DF.iloc[i,:] 
     if row[2] == greek: 
      D_c1_c2[row[0]] = row[1] 
    D_greek_value[greek] = D_c1_c2 
D_greek_value 

生成的詞典如下所示:

{'alpha': {'one': '0.67919712421', 
    'three': '0.67171020684', 
    'two': '0.571150669821'}, 
'beta': {'one': '0.895090207979', 'three': '0.489490074662'}, 
'gamma': {'one': '0.964777504708', 
    'three': '0.134397632659', 
    'two': '0.10302290374'}, 
'zeta': {'three': '0.0204226923557'}} 

我不想讓c1來塊(「one」每次都在一起)。我正在做一個幾百MB的csv,我覺得我做錯了。如果您有任何想法請幫助!

回答

4

IIUC,你可以利用groupby來處理大部分工作:

>>> result = df.groupby("c3")[["c1","c2"]].apply(lambda x: dict(x.values)).to_dict() 
>>> pprint.pprint(result) 
{'alpha': {'one': 0.440958516531, 
      'three': 0.677464637887, 
      'two': 0.8827243364640001}, 
'beta': {'one': 0.47643995372299996, 'three': 0.29292767009599996}, 
'gamma': {'one': 0.254235673552, 
      'three': 0.0971956881825, 
      'two': 0.79817899139}, 
'zeta': {'three': 0.993934915508}} 

一些解釋。這給了我們,我們要轉換成字典組:

>>> grouped = df.groupby("c3")[["c1", "c2"]] 
>>> grouped.apply(lambda x: print(x,"\n","--")) # just for display purposes 
     c1     c2 
0 one 0.679926178687387 
3 two 0.11495090934413166 
5 three 0.7458197179794177 
-- 
     c1     c2 
0 one 0.679926178687387 
3 two 0.11495090934413166 
5 three 0.7458197179794177 
-- 
     c1     c2 
1 one 0.12943266757277916 
6 three 0.28944292691097817 
-- 
     c1     c2 
2 one 0.36642834809341274 
4 two 0.5690944224514624 
7 three 0.7018221838129789 
-- 
     c1     c2 
8 three 0.7195852795555373 
-- 

鑑於這些子幀的,說下到最後,我們需要想出一個辦法把它變成一本字典。例如:

>>> d3 
     c1  c2 
2 one 0.366428 
4 two 0.569094 
7 three 0.701822 

如果我們試圖dictto_dict,我們沒有得到我們想要的,因爲指數和列標籤的方式獲得:

>>> dict(d3) 
{'c1': 2  one 
4  two 
7 three 
Name: c1, dtype: object, 'c2': 2 0.366428 
4 0.569094 
7 0.701822 
Name: c2, dtype: float64} 
>>> d3.to_dict() 
{'c1': {2: 'one', 4: 'two', 7: 'three'}, 'c2': {2: 0.36642834809341279, 4: 0.56909442245146236, 7: 0.70182218381297889}} 

但是,我們可以通過刪除忽略此一直到帶有.values可以傳遞到dict基礎數據,然後:

>>> d3.values 
array([['one', 0.3664283480934128], 
     ['two', 0.5690944224514624], 
     ['three', 0.7018221838129789]], dtype=object) 
>>> dict(d3.values) 
{'three': 0.7018221838129789, 'one': 0.3664283480934128, 'two': 0.5690944224514624} 

因此,如果我們將此我們得到一個系列指數作爲我們想要的C3鍵和值的字典,我們可以變成使用.to_dict()字典:

>>> result = df.groupby("c3")[["c1", "c2"]].apply(lambda x: dict(x.values)) 
>>> result 
c3 
alpha {'three': '0.7458197179794177', 'one': '0.6799... 
beta  {'one': '0.12943266757277916', 'three': '0.289... 
gamma {'three': '0.7018221838129789', 'one': '0.3664... 
zeta      {'three': '0.7195852795555373'} 
dtype: object 
>>> result.to_dict() 
{'zeta': {'three': '0.7195852795555373'}, 'gamma': {'three': '0.7018221838129789', 'one': '0.36642834809341274', 'two': '0.5690944224514624'}, 'beta': {'one': '0.12943266757277916', 'three': '0.28944292691097817'}, 'alpha': {'three': '0.7458197179794177', 'one': '0.679926178687387', 'two': '0.11495090934413166'}} 
+1

很不錯的。我想知道這是否比我發佈的更快。我希望'groupby'速度非常快,但lambda可能會減慢速度。我雖然懶得時間。 –

+2

@StevenRumbalski:我也是。 :-)我試圖看看是否可以使用矢量化操作獲得相同的結果,但彈回;別人可能會有更聰明的東西。但我認爲你已經把你的手指放在了一個大問題上(太多的迭代),除此之外的一切都是微不足道的。 – DSM

+0

@DSM我知道如何使用lambda函數進行排序,但確切地說是從「.apply」到「.to_dict()」? –

3

對於每個獨特的希臘字母,您在數據框上迭代多次。最好迭代一次。

由於需要字典的字典,你可以使用一個collections.defaultdictdict作爲嵌套http://stardict.sourceforge.net/Dictionaries.php下載的默認構造函數:

from collections import defaultdict 

result = defaultdict(dict) 
for dx, num_word, val, greek in DF.itertuples(): 
    result[greek][num_word] = val 

,或使用普通的字典,並setdefault調用創建嵌套字典。通過C3首先我們組,並選擇列C1和C2:

result = {} 
for dx, num_word, val, greek in DF.itertuples(): 
    result.setdefault(greek, {})[num_word] = val