關於稀疏（csr）特徵矩陣的分層KFold

我有一個包含模型特徵的大型稀疏矩陣（95000,12000）。我想在python中使用Sklearn.cross_validation模塊進行分層K交叉驗證。但是，我還沒有找到一種在python中索引稀疏矩陣的方法。關於稀疏（csr）特徵矩陣的分層KFold

無論如何，我可以對我的稀疏特徵矩陣執行StratifiedKFold嗎？

來源

2015-11-07 Bishwarup Bhattacharjee

它是否給你一個錯誤「整數不能被索引」？ – CoderBC

很明顯，你甚至沒有嘗試。 Scikit-learn CV在稀疏矩陣中工作得很好，因爲csr_matrices是scikit-learn中的默認數據表示。

來源

2015-11-08 09:33:09 lejlot

試試這個：

# First make sure sparse matrix is to_csr 
X_sparse = x.tocsr() 
y= output 
X_train = {} 
Y_train = {} 

skf = StratifiedKFold(5, shuffle=True, random_state=12345) 
i=0 
for train_index, test_index in skf.split(X,y): 
    print("TRAIN:", train_index, "TEST:", test_index) 
    X_train[i], X_test[i] = X[train_index], X[test_index] 
    y_train[i], y_test[i] = y[train_index], y[test_index] 
    i +=1

來源

2017-04-15 15:00:56 CoderBC

關於稀疏（csr）特徵矩陣的分層KFold

回答

相關問題