計算熊貓數據框中的單個詞

我正在計算我的數據框的一列中的單個詞。它看起來像這樣。實際上，文本是推文。計算熊貓數據框中的單個詞

text 
this is some text that I want to count 
That's all I wan't 
It is unicode text

所以我從其他計算器問題，發現的是，我可以使用以下方法：

Count most frequent 100 words from sentences in Dataframe Pandas

Count distinct words from a Pandas Data Frame

我DF被稱爲結果，這是我的代碼：

from collections import Counter 
result2 = Counter(" ".join(result['text'].values.tolist()).split(" ")).items() 
result2

我得到了follo翼錯誤：

TypeError         Traceback (most recent call last) 
<ipython-input-6-2f018a9f912d> in <module>() 
     1 from collections import Counter 
----> 2 result2 = Counter(" ".join(result['text'].values.tolist()).split(" ")).items() 
     3 result2 
TypeError: sequence item 25831: expected str instance, float found

文本的D型爲對象，從我的理解是Unicode文本數據是正確的。

來源

2015-10-20 Lam

如果您的數據框中存在float值，您想要對它們做什麼？你想數它們嗎？ –

由於這些文本應該是所有的推文，我也想數它們。如果此列還包含浮點值，那麼這是否意味着我收集的tweet只是數字？（讓我好奇哪些是浮動） – Lam

是可能的。 –

發生此問題的原因是您的系列中的某些值（result['text']）類型爲float。如果你想在' '.join()期間考慮它們，那麼在將它們傳遞到str.join()之前，您需要將浮點數轉換爲字符串。可以使用Series.astype()將所有值轉換爲字符串。此外，你真的不需要使用.tolist()，你也可以簡單地將該系列文件給str.join()。示例 -

result2 = Counter(" ".join(result['text'].astype(str)).split(" ")).items()

演示 -

In [60]: df = pd.DataFrame([['blah'],['asd'],[10.1]],columns=['A']) 

In [61]: df 
Out[61]: 
     A 
0 blah 
1 asd 
2 10.1 

In [62]: ' '.join(df['A']) 
--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-62-77e78c2ee142> in <module>() 
----> 1 ' '.join(df['A']) 

TypeError: sequence item 2: expected str instance, float found 

In [63]: ' '.join(df['A'].astype(str)) 
Out[63]: 'blah asd 10.1'

來源

2015-10-20 16:27:25

謝謝，這似乎工作。現在輸出結果是字典，將它移回到熊貓數據框架還是以某種方式繼續在df內工作是合乎邏輯的？ – Lam

取決於你打算做什麼工作。但我的猜測是，如果你打算做某種分析，數據框會更快。 –

通用答案通用問題：D當我有一個具體的問題，我會提出一個新的問題。謝謝您的幫助！ – Lam

在我用下面的代碼去年底：

pd.set_option('display.max_rows', 100) 
words = pd.Series(' '.join(result['text'].astype(str)).lower().split(" ")).value_counts()[:100] 
words

問題阿南德小號庫馬爾但是解決。

來源

2015-10-20 17:03:33 Lam

計算熊貓數據框中的單個詞

回答

相關問題