import pandas as pd
import numpy as np
import random
labels = ["c1","c2","c3"]
c1 = ["one","one","one","two","two","three","three","three","three"]
c2 = [random.random() for i in range(len(c1))]
c3 = ["alpha","beta","gamma","alpha","gamma","alpha","beta","gamma","zeta"]
DF = pd.DataFrame(np.array([c1,c2,c3])).T
DF.columns = labels
數據框的樣子:熊貓:最有效的方法,使詞典的詞典從數據幀列
c1 c2 c3
0 one 0.440958516531 alpha
1 one 0.476439953723 beta
2 one 0.254235673552 gamma
3 two 0.882724336464 alpha
4 two 0.79817899139 gamma
5 three 0.677464637887 alpha
6 three 0.292927670096 beta
7 three 0.0971956881825 gamma
8 three 0.993934915508 zeta
我能想到做字典的唯一辦法是:
D_greek_value = {}
for greek in set(DF["c3"]):
D_c1_c2 = {}
for i in range(DF.shape[0]):
row = DF.iloc[i,:]
if row[2] == greek:
D_c1_c2[row[0]] = row[1]
D_greek_value[greek] = D_c1_c2
D_greek_value
生成的詞典如下所示:
{'alpha': {'one': '0.67919712421',
'three': '0.67171020684',
'two': '0.571150669821'},
'beta': {'one': '0.895090207979', 'three': '0.489490074662'},
'gamma': {'one': '0.964777504708',
'three': '0.134397632659',
'two': '0.10302290374'},
'zeta': {'three': '0.0204226923557'}}
我不想讓c1來塊(「one」每次都在一起)。我正在做一個幾百MB的csv,我覺得我做錯了。如果您有任何想法請幫助!
很不錯的。我想知道這是否比我發佈的更快。我希望'groupby'速度非常快,但lambda可能會減慢速度。我雖然懶得時間。 –
@StevenRumbalski:我也是。 :-)我試圖看看是否可以使用矢量化操作獲得相同的結果,但彈回;別人可能會有更聰明的東西。但我認爲你已經把你的手指放在了一個大問題上(太多的迭代),除此之外的一切都是微不足道的。 – DSM
@DSM我知道如何使用lambda函數進行排序,但確切地說是從「.apply」到「.to_dict()」? –