Q

如何獲取數據框中的所有唯一字？

2016-07-24 50 views 0 likes

0

我有一個產品清單及其相應評論的數據框如何獲取數據框中的所有唯一字？

+ --------- + -------------------- ---------------------------- +
|產品|評論|
+ --------- + ------------------------------------- ----------- +
| product_a |這對休閒午餐有好處
+ --------- + ------------------------------------- ----------- +
| product_b |艾利是最知名的咖啡師之一|
+ --------- + ------------------------------------- ----------- +
| product_c |導遊告訴我們祕密|
+ --------- + ------------------------------------- ----------- +

如何獲取數據框中的所有唯一字？

我做了一個功能：

def count_words(text): 
    try: 
     text = text.lower() 
     words = text.split() 
     count_words = Counter(words) 
    except Exception, AttributeError: 
     count_words = {'':0} 
    return count_words

並應用功能數據幀，但只給了我的話計數每一行。

reviews['words_count'] = reviews['review'].apply(count_words)

2016-07-24 Luis Ramon Ramirez Rodriguez

+0

你可以發佈你的數據框樣本嗎？ –

A

回答

2

與此開始：

dfx 
       review 
0  United Kingdom 
1 The United Kingdom 
2  Dublin, Ireland 
3 Mardan, Pakistan

要獲得所有詞語的「審查」欄：

list(dfx['review'].str.split(' ', expand=True).stack().unique()) 

    ['United', 'Kingdom', 'The', 'Dublin,', 'Ireland', 'Mardan,', 'Pakistan']

爲了得到「審查」列數：

dfx['review'].str.split(' ', expand=True).stack().value_counts() 


United  2 
Kingdom  2 
Mardan,  1 
The   1 
Ireland  1 
Dublin,  1 
Pakistan 1 
dtype: int64

2016-07-25 00:37:02 Merlin

相關問題