如何均衡兩個稀疏矩陣的列

我有兩個稀疏矩陣，對於訓練和測試集，我需要刪除每個不在其他列中的列 - 使列中的列都相同。目前，我正與一個循環這樣做，但我敢肯定有一個更有效的方式來做到這一點：如何均衡兩個稀疏矩陣的列

# take out features in training set that are not in test 
    i<-0 
    for(feature in [email protected][2][[1]]){ 
    i<-i+1 
    if(!(feature %in% [email protected][2][[1]])){ 
     removerows<-c(removerows, i) 
    } 
    } 
    testmatrix<-testmatrix[,-removerows] 

# and vice versa...

來源

2013-06-23 paulusm

這將是更容易幫助，如果我們有'testmatrix'和'trainmatrix' ... – alexwhan

對我來說，它看起來像所有你想做的事就是保持testmatrix列也出現在trainmatrix之間，反之亦然。既然你想這適用於這兩個矩陣，一個快速的方法是使用上的colnames向量intersect從每個矩陣找到交叉colnames，然後用這個子集：

# keep will be a vector of colnames that appear in BOTH train and test matrices 
keep <- intersect(colnames(test) , colnames(train)) 

# Then subset on this vector 
testmatrix <- testmatrix[ , keep ] 
trainmatrix <- trainmatrix[ , keep ]

來源

2013-06-23 10:20:44

嗨，這些是在Matrix庫 – paulusm

@pablomo使用sparse.model.matrix（）創建的dgCMatrix類型，所以它看起來像上面的方法將工作得很好，因爲你可以使用'colnames 「 –

如何均衡兩個稀疏矩陣的列

回答

相關問題