TypeError：不支持的操作數類型爲 - ：'numpy.ndarray'和'numpy.ndarray'，同時嘗試做PCA

我想在稀疏矩陣上做PCA，但遇到錯誤：TypeError：不支持的操作數類型爲 - ：'numpy.ndarray'和'numpy.ndarray'，同時嘗試做PCA

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

這裏是我的代碼：

import sys 
import csv 
from sklearn.decomposition import PCA 

data_sentiment = [] 
y = [] 
data2 = [] 
csv.field_size_limit(sys.maxint) 
with open('/Users/jasondou/Google Drive/data/competition_1/speech_vectors.csv') as infile: 
    reader = csv.reader(infile, delimiter=',', quotechar='|') 
    n = 0 
    for row in reader: 
     # sample = row.split(',') 
     n += 1 
     if n%1000 == 0: 
      print n 
     data_sentiment.append(row[:25000]) 

pca = PCA(n_components=3) 
pca.fit(data_sentiment) 
PCA(copy=True, n_components=3, whiten=False) 
print(pca.explained_variance_ratio_) 
y = pca.transform(data_sentiment)

輸入數據是speech_vector.csv，其中2740 * 50000矩陣發現available here

以下是完整的錯誤回溯：

Traceback (most recent call last): 
    File "test.py", line 45, in <module> 
    y = pca.transform(data_sentiment) 
    File "/Users/jasondou/anaconda/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 397, in transform 
    X = X - self.mean_ 
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

我不太明白self.mean_這裏指的是什麼。

來源

2015-04-15 Jason

這將是有用的知道**哪個**行錯誤發生，也是你的代碼是在它的當前形式只是無稽之故，因爲你正在傳遞一個空列表到'pca.fit' – EdChum

我在想這發生了在其他地方（例如在'pca.fit（）'或'pca.transform（）'）;我沒有看到任何可能在此頂級代碼中直接引發此錯誤的減法操作。 – Kevin

當你說*「不太明白什麼self.mean_在這裏」時，我不知道你指的是什麼* –

您沒有正確解析CSV文件。每個row您reader的回報將是一個字符串列表，像這樣：

row = ['0.0', '1.0', '2.0', '3.0', '4.0']

你data_sentiment因此將是一個列表中，列出了-OF- 串，例如：

data_sentiment = [row, row, row]

當您直接將此內容傳遞給pca.fit()時，它會在內部轉換爲numpy數組，其中還包含字符串：

X = np.array(data_sentiment) 
print(repr(X)) 
# array([['0.0', '1.0', '2.0', '3.0', '4.0'], 
#  ['0.0', '1.0', '2.0', '3.0', '4.0'], 
#  ['0.0', '1.0', '2.0', '3.0', '4.0']], 
#  dtype='|S3')

numpy的具有從字符串的另一個陣列減去一個字符串數組沒有規則：

X - X 
# TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

這個錯誤本來是很容易被發現，如果你費心向我們展示一些的data_sentiment的內容你的問題，就像我問你的那樣。

你需要做的是你的字符串轉換成浮點數，例如：

data_sentiment.append([float(s) for s in row[:25000]])

更簡單的方法是使用np.loadtxt解析CSV文件：

data_sentiment = np.loadtxt('/path/to/file.csv', delimiter=',')

如果您安裝了熊貓，那麼對於諸如此類的大型陣列，pandas.read_csv可能會比np.loadtxt更快。

來源

2015-04-15 23:35:16

非常感謝Ali_m！ – Jason

如果我的答案解決了你的問題，那麼你應該接受它（點擊我的答案旁邊的勾號） –

感謝您的指導！很高興見到你！ – Jason

TypeError：不支持的操作數類型爲 - ：'numpy.ndarray'和'numpy.ndarray'，同時嘗試做PCA

回答

相關問題