如何合併大的稀疏矩陣

我有25個稀疏矩陣的大列表（他們真的很大 - 其中一個100M或更多的元素），我需要將它們合併成一個大的稀疏矩陣。如何合併大的稀疏矩陣

例如：一個矩陣A可以像這樣（我的真實100M元素的矩陣的它的子矩陣）：

> A 
5 x 4 sparse Matrix of class "dgCMatrix" 
       SKU 
CustomerID   404  457  547  558  
    100002_24655  1  .  .  .  
    100003_46919  .  1  1  .  
    100007_46702  .  .  .  .  
    100012_47709  .  .  .  .  
    100013_46132  1  1  1  1 

> dput(A) 
new("dgCMatrix" 
    , i = c(0L, 4L, 1L, 4L, 1L, 4L, 4L) 
    , p = c(0L, 2L, 4L, 6L, 7L) 
    , Dim = c(5L, 4L) 
    , Dimnames = structure(list(CustomerID = c("100002_24655", "100003_46919", 
"100007_46702", "100012_47709", "100013_46132"), SKU = c("404", 
"457", "547", "558")), .Names = c("CustomerID", "SKU" 
)) 
    , x = c(1, 1, 1, 1, 1, 1, 1) 
    , factors = list() 
)

其他B可以是這樣的：

> B 
7 x 5 sparse Matrix of class "dgCMatrix" 
       SKU 
CustomerID   191  404  558  715  787   
    100002_24655  .  .  .  .  .    
    100007_46702  1  1  1  1  1    
    100012_47709  .  .  1  .  .    
    100013_46132  .  .  .  .  1    
    100014_46400  .  .  .  .  .    
    100014_605414  1  1  1  .  .    
    100014_631294  .  .  1  1  1    

> dput(B) 
new("dgCMatrix" 
    , i = c(1L, 5L, 1L, 5L, 1L, 2L, 5L, 6L, 1L, 6L, 1L, 3L, 6L) 
    , p = c(0L, 2L, 4L, 8L, 10L, 13L) 
    , Dim = c(7L, 5L) 
    , Dimnames = structure(list(CustomerID = c("100002_24655", "100007_46702", 
"100012_47709", "100013_46132", "100014_46400", "100014_605414", 
"100014_631294"), SKU = c("191", "404", "558", "715", 
"787")), .Names = c("CustomerID", "SKU")) 
    , x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) 
    , factors = list() 
)

輸出應該看起來像這樣:(第一部分是第一個矩陣，第二個是第二個矩陣 - 我用空格分開以便更好地查看）

12 x 7 sparse Matrix of class "dgCMatrix"  
      404 457 547 558 191 715 787  
    [1, ]  1 . . . . . .  
    [2, ]  . 1 1 . . . . 
    [3, ]  . . . . . . . 
    [4, ]  . . . . . . . 
    [5, ]  1 1 1 1 . . . 

    [6, ]  . . . . . . . 
    [7, ]  1 . . 1 1 1 1 
    [8, ]  . . . 1 . . . 
    [9, ]  . . . . . . 1 
    [10,]  . . . . . . . 
    [11,]  1 . . 1 1 . . 
    [12,]  . . . 1 . 1 1

這意味着我想按列名進行合併。那麼我怎麼能合併所有的25稀疏矩陣？

來源

2017-09-06 Martina Zapletalová

'>升< - 列表（A，B，C，......） > do.call（rbind，l）的' – Sagar

@Sagar矩陣必須有相同的列數，如果你想使用rbind –

@MartinaZapletalová - 我沒有意識到它們的列數有所不同......我的不好。 – Sagar

-1

基於this answer，我們可以擴展這種方法來合併矩陣的任意長度列表這樣

merge.sparse = function(M.list) { 
    A = M.list[[1]] 

    for (B in M.list[[2:length(M.list)]]){ 
    # finding what's missing 
    misA = colnames(B)[!colnames(B) %in% colnames(A)] 
    misB = colnames(A)[!colnames(A) %in% colnames(B)] 

    misAl = as.vector(numeric(length(misA)), "list") 
    names(misAl) = misA 
    misBl = as.vector(numeric(length(misB)), "list") 
    names(misBl) = misB 

    ## adding missing columns to initial matrices 
    An = do.call(cbind, c(A, misAl)) 
    Bn = do.call(cbind, c(B, misBl))[,colnames(An)] 

    # final bind 
    A = rbind(An, Bn) 
    } 
    A 
} 

x = merge.sparse(list(A,B))

來源

2017-09-06 16:29:12 dww

它看起來不錯，但： **錯誤：節點堆棧溢出** **包裝過程中發生錯誤：節點堆棧溢出** 錯誤發生在'An = do.call（cbind，c（A，misAl））'和B0 = do.call（cbind，c（B，misB1））[，colnames（An）]' –

我嘗試'An = Reduce（cbind，c（A，misAl））'並且它有效（我現在在一個稀疏矩陣上嘗試它），但是當我嘗試'Bn = Reduce在intI（j，n = x @ Dim [2]，dn [[2]]，give.dn = FALSE）中的錯誤：無效的字符索引** –

我只是添加我編輯的代碼以避免此錯誤，並且在M.list [[2：length（M.list）]]中有'B'的問題''但我沒有知道爲什麼。也許我的編輯太複雜了，所以如果你有任何建議，請在下面寫下[this asnwer]（https://stackoverflow.com/a/46092893/8416107） –

所以我編輯一點點dww answear避免我在評論中提到錯誤。但它有點慢。但我有很大的矩陣。

> proc.time() - ptm 
    user system elapsed 
572.384 213.179 793.550

這是編輯的代碼：

merge.sparse = function(M.list) { 
    A = M.list[[1]] 

    for (i in 2:length(M.list)){ #i indexes of matrices 
    # finding what's missing 
    misA = colnames(M.list[[i]])[!colnames(M.list[[i]]) %in% colnames(A)] 
    misB = colnames(A)[!colnames(A) %in% colnames(M.list[[i]])] 

    misAl = as.vector(numeric(length(misA)), "list") 
    names(misAl) = misA 
    misBl = as.vector(numeric(length(misB)), "list") 
    names(misBl) = misB 

    ## adding missing columns to initial matrices 
    An = Reduce(cbind, c(A, misAl)) 
    lenA <- ncol(An)-length(misA)+1 
    colnames(An)[lenA:ncol(An)] = names(misAl) 

    Bn = Reduce(cbind, c(M.list[[i]], misBl)) 
    lenB <- ncol(Bn)-length(misB)+1 
    colnames(Bn)[lenB:ncol(Bn)] = names(misBl) 
    Bn <- Bn[,colnames(An)] 

    # final bind 
    A = rbind(An, Bn, use.names = T) 
    print(c(length(M.list), i)) 
    } 
    A 
}

來源

2017-09-07 09:35:01

如何合併大的稀疏矩陣

回答

相關問題