1
我正在嘗試創建一個數據框,其中第一列(「值」)在每行中都有一個多字字符串,而其他所有列都有標籤,用於表示來自「值」中所有字符串的唯一字。我想用每個字符串(一行)檢查所有唯一字(列)的詞頻來填充這個數據幀。從某種意義上說,創建一個簡單的TDM計算DataFrame中的字詞頻率
rows = ['you want peace', 'we went home', 'our home is nice', 'we want peace at home']
col_list = [word.lower().split(" ") for word in rows]
set_col = set(list(itertools.chain.from_iterable(col_list)))
columns = set_col
ncols = len(set_col)
testDF = pd.DataFrame(columns = set_col)
testDF.insert(0, "Value", " ")
testDF["Value"] = rows
testDF.fillna(0, inplace=True)
irow = 0
for tweet in testDF["Value"]:
for word in tweet.split(" "):
for col in xrange(1, ncols):
if word == testDF.columns[col]: testDF[irow, col] += 1
irow += 1
testDF.head()
不過,我得到一個錯誤:
KeyError Traceback (most recent call last)
<ipython-input-64-9a991295ccd9> in <module>()
23 for col in xrange(1, ncols):
24
---> 25 if word == testDF.columns[col]: testDF[irow, col] += 1
26
27 irow += 1
C:\Users\Tony\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1795 return self._getitem_multilevel(key)
1796 else:
-> 1797 return self._getitem_column(key)
1798
1799 def _getitem_column(self, key):
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12280)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12231)()
KeyError: (0, 9)
我不知道什麼是錯的。因此,將感謝您的幫助 另外,如果有更清潔的方式來做到這一點(除了沒有textmining - 安裝問題)這將是很好的學習!
.iloc就像一陣微風!對Python仍然陌生,並且一直忘記對數據框元素的訪問與對pd.arrays的訪問不同:) – Toly