熊貓：如何獲取包含值列表的列的唯一值？

考慮下面的數據幀熊貓：如何獲取包含值列表的列的唯一值？

df = pd.DataFrame({'name' : [['one two','three four'], ['one'],[], [],['one two'],['three']], 
        'col' : ['A','B','A','B','A','B']})  
df.sort_values(by='col',inplace=True) 

df 
Out[62]: 
    col     name 
0 A [one two, three four] 
2 A      [] 
4 A    [one two] 
1 B     [one] 
3 B      [] 
5 B    [three]

我想獲得一個跟蹤列入name爲col每個組合的所有唯一字符串的列。

也就是說，預期產量

df 
Out[62]: 
    col     name unique_list 
0 A [one two, three four] [one two, three four] 
2 A      [] [one two, three four] 
4 A    [one two] [one two, three four] 
1 B     [one] [one, three] 
3 B      [] [one, three] 
5 B    [three] [one, three]

事實上，說爲一組，你可以看到，唯一的一組字符串包含在[one two, three four]，[]和[one two]是[one two]

我能獲得相應使用的唯一值數量Pandas : how to get the unique number of values in cells when cells contain lists?：

df['count_unique']=df.groupby('col')['name'].transform(lambda x: list(pd.Series(x.apply(pd.Series).stack().reset_index(drop=True, level=1).nunique()))) 


df 
Out[65]: 
    col     name count_unique 
0 A [one two, three four]   2 
2 A      []   2 
4 A    [one two]   2 
1 B     [one]   2 
3 B      []   2 
5 B    [three]   2

，但替換nunique與unique以上失敗。

任何想法？謝謝！

來源

2016-09-14 ℕʘʘḆḽḘ

下面是解

df['unique_list'] = df.col.map(df.groupby('col')['name'].sum().apply(np.unique)) 
    df

來源

2016-09-14 22:47:11 piRSquared

有趣。 '總和'字符串？！ –

@Noobie它比這更糟糕。它是名單上的太陽。它生成一個連接列表，我在這個連接列表中應用nhe.nif.unique – piRSquared

hehehe。我只是嘗試，但似乎你有很好的解決方案失敗，當有遺漏值col。在這種情況下，我得到'TypeError：只能連接列表（而不是「int」）到列表。用'fillna（''）'或'fillna（'[]'）替換缺失的值不起作用。有任何想法嗎？ –

嘗試：

uniq_df = df.groupby('col')['name'].apply(lambda x: list(set(reduce(lambda y,z: y+z,x)))).reset_index() 
uniq_df.columns = ['col','uniq_list'] 
pd.merge(df,uniq_df, on='col', how='left')

所需的輸出：

col     name    uniq_list 
0 A [one two, three four] [one two, three four] 
1 A      [] [one two, three four] 
2 A    [one two] [one two, three four] 
3 B     [one]   [three, one] 
4 B      []   [three, one] 
5 B    [three]   [three, one]

來源

2016-09-14 22:08:02 Abdou

感謝@abdou！讓我試試 –

熊貓：如何獲取包含值列表的列的唯一值？

回答

相關問題