2017-05-15 30 views
1

我工作的一個PMI的問題,到目前爲止,我有這樣一個數據幀:如何將數據幀轉換與串列到csr_matrix

w = ['by', 'step', 'by', 'the', 'is', 'step', 'is', 'by', 'is'] 
c = ['step', 'what', 'is', 'what', 'the', 'the', 'step', 'the', 'what'] 
ppmi = [1, 3, 12, 3, 123, 1, 321, 1, 23] 
df = pd.DataFrame({'w':w, 'c':c, 'ppmi': ppmi}) 

我想這個數據幀轉換成稀疏矩陣。由於wc是字符串列表,如果我做csr_matrix((ppmi, (w, c))),它會給我一個錯誤TypeError: cannot perform reduce with flexible type。什麼是轉換此數據框的另一種方法?

+0

我不認爲''scipy'支持csr_matrix'混合類型,所以我不知道你是什麼期待...你可能會考慮一個'pandas' [稀疏的數據結構](http://pandas.pydata.org/pandas-docs/version/0.15.2/sparse.html)。 –

回答

0

也許你可以嘗試用coo_matrix

import pandas as pd 
import scipy.sparse as sps 
w = ['by', 'step', 'by', 'the', 'is', 'step', 'is', 'by', 'is'] 
c = ['step', 'what', 'is', 'what', 'the', 'the', 'step', 'the', 'what'] 
ppmi = [1, 3, 12, 3, 123, 1, 321, 1, 23] 
df = pd.DataFrame({'w':w, 'c':c, 'ppmi': ppmi}) 
df.set_index(['w', 'c'], inplace=True) 
mat = sps.coo_matrix((df['ppmi'],(df.index.labels[0], df.index.labels[1]))) 
print(mat.todense()) 

輸出:

[[ 12 1 1 0] 
[ 0 321 123 23] 
[ 0 0 1 3] 
[ 0 0 0 3]] 
相關問題