分組數據產生對抗唯一的ID名單中的熊貓/ Python的

您好我現在用的是熊貓/ Python和沿如下的數據幀：分組數據產生對抗唯一的ID名單中的熊貓/ Python的

21627 red 
21627 green 
21627 red 
21627 blue 
21627 purple 
21628 yellow 
21628 red 
21628 green 
21629 red 
21629 red

，我想，以減少：

21627 red, green, blue, purple 
21628 yellow, red, green 
21629 red

這樣做的最佳方式是什麼（並且將列表中的所有值摺疊爲唯一值）？

另外，如果我想繼續保持冗餘：

21627 red, green, red, blue, purple 
21628 yellow, red, green 
21629 red, red

請告訴我實現這一目標的最佳途徑？

在此先感謝您的幫助。

來源

2013-08-22 user7289

答案[這裏]（http://stackoverflow.com/questions/11391969/how-to-group-pandas-dataframe-entries-by-date-in-a-非唯一列）和教程[這裏]（http://wesmckinney.com/blog/?p=125） – lucasg

如果你真的想要做到這一點，你可以使用GROUPBY適用於：

In [11]: df.groupby('id').apply(lambda x: list(set(x['colours']))) 
Out[11]: 
id 
21627 [blue, purple, green, red] 
21628   [green, red, yellow] 
21629       [red] 
dtype: object 

In [12]: df.groupby('id').apply(lambda x: list(x['colours'])) 
Out[12]: 
id 
21627 [red, green, red, blue, purple] 
21628    [yellow, red, green] 
21629       [red, red] 
dtype: object

然而，包含列表DataFrames不是特別有效。

Pivot table讓你更有用的數據幀：

In [21]: df.pivot_table(rows='id', cols='colours', aggfunc=len, fill_value=0) 
Out[21]: 
colours blue green purple red yellow 
id          
21627  1  1  1 2  0 
21628  0  1  0 1  1 
21629  0  0  0 2  0

我最喜歡的功能get_dummies讓你做，但不是作爲優雅或有效（但我會瘋了，如果保持這種原始的，建議）：

In [22]: pd.get_dummies(df.set_index('id')['colours']).reset_index().groupby('id').sum() 
Out[22]: 
     blue green purple red yellow 
id          
21627  1  1  1 2  0 
21628  0  1  0 1  1 
21629  0  0  0 2  0

來源

2013-08-22 13:33:23

mabye添加到食譜這些類型的食譜 – Jeff

嘿安迪，謝謝 - 我打算使用列表反對每個ID作爲一個表在一個搜索引擎中索引 - 因此想要對每個ID的關鍵字列表 – user7289

這是另一種方式;雖然@安迪的多一點intuitve

In [24]: df.groupby('id').apply(
       lambda x: x['color'].value_counts()).unstack().fillna(0) 
Out[24]: 
     blue green purple red yellow 
id          
21627  1  1  1 2  0 
21628  0  1  0 1  1 
21629  0  0  0 2  0

來源

2013-08-22 14:03:04 Jeff

這不是你如何拼寫顏色：p，'value_counts'使它更直觀，但我認爲'pivot_table'是做的方法它。 –

我同意''pivot_table''更好; （這基本上是它內部無論如何）;我總是雖然''顏色''是口語拼寫（不贊成）:) – Jeff

分組數據產生對抗唯一的ID名單中的熊貓/ Python的

回答

相關問題