按頻率排序str系列

給定一個類型爲str的熊貓系列，我想按照str.split的頻率排序結果。按頻率排序str系列

例如，給出的系列

s = pd.Series(['abc,def,ghi','ghi,abc'])

我想獲得

s2 = pd.Series(['abc,ghi,def','abc,ghi'])

結果（ 'ABC'， 'GHI' 前 '高清' 來，因爲他們有頻率2而'def'的頻率爲1）。

從本質上講，我要求的Pandas sort list of str.split()組合和Pandas count frequencies within str series

我怎樣才能做到這一點？

來源

2016-05-02 David

試試這個：

In [71]: freq = pd.Series(s.str.split(',').sum()).value_counts() 

In [72]: s.str.split(',').apply(lambda x: ','.join(sorted(x, key=freq.get, reverse=True))) 
Out[72]: 
0 abc,ghi,def 
1  ghi,abc 
dtype: object

說明：

In [73]: freq 
Out[73]: 
ghi 2 
abc 2 
def 1 
dtype: int64 

In [75]: sorted(['abc','def','ghi'], key=freq.get, reverse=True) 
Out[75]: ['abc', 'ghi', 'def']

PS abc和ghi具有相同的重量，因爲它的不可預知的次序，他們將出現在最終的系列

來源

2016-05-02 18:07:14 MaxU

按頻率排序str系列

回答

相關問題