從Pandas DataFrame存儲數據的最快方法

我正在查看Fastest way to iterate through a pandas dataframe?，我不確定它是否可以應用於我的情況。我想使樣品和功能的字典中數據幀從Pandas DataFrame存儲數據的最快方法

#DF_gex is a DataFrame 

D_sample_Data = {} 

class Sample: 
    def __init__(self,D_key_value): 
     self.D_key_value = D_key_value 

for i in range(DF_gex.shape[0]): 
    D_key_value = {} 
    sample = DF_gex.index[i] 
    for j in range(DF_gex.shape[1]): 
     key = DF_gex.columns[j] 
     value = DF_gex.iloc[i,j] 
     D_key_value[key] = value 
    D_sample_Data[sample].D_key_value = D_key_value

我基本上有一個名爲樣品在這種情況下類，在樣本I類存儲字典爲每個實例（D_key_value）。現在我遍歷每一行和每一列。

有沒有更快的方法來做到這一點？我知道熊貓是基於Numpy數組，它具有用於索引的特殊功能。這些方法中的一種可以用於這個嗎？

最後，我將有一個字典對象D_sample_Data，其中我輸入一個樣本名稱並獲取一個類實例。在那個類實例中，將會有一個該樣本鍵唯一的字典對象。

來源

2015-10-15 O.rka

你可以更新什麼樣的你正在尋找的輸出？ –

@AnandSKumar我添加了輸出的類型。它基本上是一個字典，其中D_sample_Data導致類實例，並且該實例具有一些字典和其他對象。這是我能想到的最簡單的例子 –

如果只想字典的字典，其中對於所述外字典中的鍵是用於內的字典索引和鍵是列和值是在該索引列的對應值（或含有的類字典字典）。

那麼你不需要循環，你可以簡單地使用DataFrame.to_dict()方法。示例 -

resultdict = df.T.to_dict()

或者從Pandas版本0.17.0開始，還可以使用關鍵字參數orient='index'。示例 -

resultdict = df.to_dict(orient='index')

演示 -

In [73]: df 
Out[73]: 
    Col1 Col2 Col3 
a  1  2  3 
b  4  5  6 
c  7  8  9 

In [74]: df.T.to_dict() 
Out[74]: 
{'a': {'Col1': 1, 'Col2': 2, 'Col3': 3}, 
'b': {'Col1': 4, 'Col2': 5, 'Col3': 6}, 
'c': {'Col1': 7, 'Col2': 8, 'Col3': 9}}

如果你想外字典的值是class Sample類型，儘管我幾乎疑問是有用的話，那麼你可以做 -

class Sample: 
    def __init__(self,D_key_value): 
     self.D_key_value = D_key_value 

resultdict = df.T.to_dict() 

resultdict = {k:Sample(v) for k,v in resultdict.items()}

來源

2015-10-15 20:04:07

我只是將我的熊貓更新到了17.感謝那正是我需要的！ –

從Pandas DataFrame存儲數據的最快方法

回答

相關問題