熊貓 - 快速創建詞典列和行組字典

我試圖快速製作字典中的值組，其中字典中的每個元素都有一個與該值關聯的其他兩列的鍵值對。例如：熊貓 - 快速創建詞典列和行組字典

我的數據：

    cheese    x    y 
1 0000000000000005559    1    2 
2 0000000000000005559    2    2 
3 0000000000000004058    3    5 
4 0000000000000004058    4    5 
5 0000000000000004058    5    5

所需的輸出：

cheese 
0000000000000005559    {1: 2, 2: 2} 
0000000000000004058  {3: 5, 4: 5, 5: 5} 
0000000000000007157    {6: 7, 7: 7} 
0000000000000000815 {8: 10, 9: 10, 10: 10} 
0000000000000009160   {11: 12, 12: 12}

我能做到這一點有過於複雜lambda和apply但與大dataframes（百萬順序很慢獨特的分組）。我怎樣才能快速實現這一目標？

來源

2017-10-05 guy

使用defaultdict
這應該是相當快

from collections import defaultdict 

d = defaultdict(dict) 

es = df.epoch.values.tolist() 
xs = df.x.values.tolist() 
ys = df.y.values.tolist() 

for e, x, y in zip(es, xs, ys): 
    d[e][x] = y 

pd.Series(d) 

1505339100449045559   {1: 2, 2: 2} 
1505339102148504058 {3: 5, 4: 5, 5: 5} 
dtype: object

來源

2017-10-05 15:01:53 piRSquared

這個很快。先生，您可以添加計時na – Dark

我測試了它並獲得（以秒爲單位）： 'Out [84]：6.7047929763793945' – guy

使用

In [1544]: df.groupby('epoch').apply(lambda x: dict(x[['x', 'y']].values)) 
Out[1544]: 
epoch 
1505339100449045559   {1: 2, 2: 2} 
1505339102148504058 {3: 5, 4: 5, 5: 5} 
dtype: object

一樣df.groupby('epoch')[['x', 'y']].apply(lambda x: dict(x.values))

和Bharathdf.groupby('epoch').apply(lambda x: dict(zip(x['x'], x['y'])))

時序

In [1585]: ndf = pd.concat([df]*1000, ignore_index=True) 

In [1587]: %timeit ndf.groupby('epoch').apply(lambda x: dict(zip(x['x'], x['y']))) 
100 loops, best of 3: 3.65 ms per loop 

In [1586]: %timeit ndf.groupby('epoch')[['x', 'y']].apply(lambda x: dict(x.values)) 
100 loops, best of 3: 14.9 ms per loop 

In [1588]: %timeit ndf.groupby('epoch').apply(lambda x: dict(x[['x', 'y']].values)) 
100 loops, best of 3: 15.3 ms per loop

來源

2017-10-05 14:21:07 Zero

這基本上就是我，是真的不快速的方法。我在一秒鐘內記下結果。 – guy

熊貓 - 快速創建詞典列和行組字典

回答

相關問題