計算DataFrame中的字詞頻率

我正在嘗試創建一個數據框，其中第一列（「值」）在每行中都有一個多字字符串，而其他所有列都有標籤，用於表示來自「值」中所有字符串的唯一字。我想用每個字符串（一行）檢查所有唯一字（列）的詞頻來填充這個數據幀。從某種意義上說，創建一個簡單的TDM計算DataFrame中的字詞頻率

rows = ['you want peace', 'we went home', 'our home is nice', 'we want peace at home'] 
col_list = [word.lower().split(" ") for word in rows] 
set_col = set(list(itertools.chain.from_iterable(col_list))) 

columns = set_col 
ncols = len(set_col) 

testDF = pd.DataFrame(columns = set_col) 
testDF.insert(0, "Value", " ") 

testDF["Value"] = rows 
testDF.fillna(0, inplace=True) 

irow = 0 

for tweet in testDF["Value"]: 

    for word in tweet.split(" "): 
     for col in xrange(1, ncols): 

      if word == testDF.columns[col]: testDF[irow, col] += 1 

    irow += 1 

testDF.head()

不過，我得到一個錯誤：

KeyError         Traceback (most recent call last) 
<ipython-input-64-9a991295ccd9> in <module>() 
    23   for col in xrange(1, ncols): 
    24 
---> 25    if word == testDF.columns[col]: testDF[irow, col] += 1 
    26 
    27  irow += 1 

C:\Users\Tony\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key) 
    1795    return self._getitem_multilevel(key) 
    1796   else: 
-> 1797    return self._getitem_column(key) 
    1798 
    1799  def _getitem_column(self, key): 

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)() 

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)() 

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12280)() 

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12231)() 

KeyError: (0, 9)

我不知道什麼是錯的。因此，將感謝您的幫助另外，如果有更清潔的方式來做到這一點（除了沒有textmining - 安裝問題）這將是很好的學習！

來源

2015-10-23 Toly

我不是100％肯定你的整個程序正在試圖做的，但如果由以下 -

testDF[irow, col]

您mean't索引細胞在數據幀，與irow爲指標和作爲列，你不能使用簡單的下標。你應該使用.iloc等。示例 -

if word == testDF.columns[col]: testDF.iloc[irow, col] += 1

使用.iloc如果你打算irow到索引的0索引號，如果irow是數據框的精確索引，可以使用.loc而不是.iloc。

來源

2015-10-23 05:50:01

.iloc就像一陣微風！對Python仍然陌生，並且一直忘記對數據框元素的訪問與對pd.arrays的訪問不同:) – Toly

計算DataFrame中的字詞頻率

回答

相關問題