帶有稀疏訓練數據的ExtraTreesClassifier？

我試圖使用ExtraTreesClassifier與稀疏數據，根據the documentation，但我確實得到一個運行時間TypeError要求密集的數據。這是scikit學習0.17.1，以下我從文檔報價：帶有稀疏訓練數據的ExtraTreesClassifier？

Parameters: X : array-like or sparse matrix of shape = [n_samples, n_features]

的代碼非常簡單：

import pandas as pd 
from scipy.sparse import coo_matrix, csr_matrix, hstack 
from sklearn.ensemble import ExtraTreesClassifier 
import numpy as np 
from scipy import * 

features = array([[1, 0], [0, 1], [3, 4]]) 
sparse_features = csr_matrix(features) 
labels = array([0, 1, 0]) 

classifier = ExtraTreesClassifier() 
classifier.fit(sparse_features, labels)

這裏的例外：TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.。這在傳遞features時工作正常。

看起來文檔已經過時，或者上面的代碼有問題嗎？

任何幫助將不勝感激。謝謝。

來源

2016-03-15 user2916547

在文檔中：「稀疏矩陣提供給稀疏csc_matrix」，請嘗試使用csc_matrix。 – Alleo

另外，我已經在sklearn == 0.17.1上運行你的代碼，並且它可以正常工作（使用csc和csr matricex）。 – Alleo

謝謝你，我已經嘗試使用csc_matrix，它確實工作。我很抱歉，我對文檔的理解是，如果它是一個稀疏矩陣，則X會在內部轉換爲該格式。請添加您的輸入作爲答案，我會結束這個問題。謝謝。 – user2916547

引用的文檔：

在內部，它會被轉換爲D型= np.float32並且如果一個稀疏矩陣被提供給稀疏csc_matrix。

所以我期望通過csc_matrix應該有所幫助。

在我的設置上，兩個版本都能正常工作（csc和csr，sklearn 0.17.1），我認爲問題可能出現在舊版本的scipy上。

來源

2016-03-16 13:30:15 Alleo

帶有稀疏訓練數據的ExtraTreesClassifier？

回答

相關問題