2015-12-09 95 views
0

從該數據幀稀疏子矩陣dfR:不降低原始矩陣維數

group from  to weight 
1  1 Joey Joey  1 
2  1 Joey Deedee  1 
3  1 Deedee Joey  1 
4  1 Deedee Deedee  1 
5  2 Johnny Johnny  1 
6  2 Johnny Tommy  1 
7  2 Tommy Johnny  1 
8  2 Tommy Tommy  1 

其可以這樣

df <- structure(list(group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), from = 
structure(c(2L, 2L, 1L, 1L, 3L, 3L, 4L, 4L), .Label = c("Deedee", 
"Joey", "Johnny", "Tommy"), class = "factor"), to = structure(c(2L, 1L, 
2L, 1L, 3L, 4L, 3L, 4L), .Label = c("Deedee", "Joey", "Johnny", 
"Tommy"), class = "factor"), weight = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L)), .Names = c("group", "from", "to", "weight"), class = "data.frame", 
row.names = c(NA, -8L)) 

創建一個稀疏矩陣mat可以使用矩陣包

可以得到
mat <- sparseMatrix(i = as.numeric(df$from), j = as.numeric(df$to), x = 
df$weight, dimnames = list(levels(df$from), levels(df$to))) 

看起來像這樣:

4 x 4 sparse Matrix of class "dgCMatrix" 
     Deedee Joey Johnny Tommy 
Deedee  1 1  .  . 
Joey  1 1  .  . 
Johnny  . .  1  1 
Tommy  . .  1  1 

如何創建使用df$group稀疏子矩陣不降低原有的矩陣尺寸?

結果應該是這樣的:

4 x 4 sparse Matrix of class "dgCMatrix" 
     Deedee Joey Johnny Tommy 
Deedee  1 1  .  . 
Joey  1 1  .  . 
Johnny  . .  .  . 
Tommy  . .  .  . 

一是理念

如果我子集的數據幀,並創建子矩陣

df1 <- subset(df, group == 1) 
mat1 <- sparseMatrix(i = as.numeric(df1 $from), j = as.numeric(df1 $to), 
x = df1 $weight) 

結果是2 x 2稀疏矩陣。這不是一個選項。除了「丟失兩個節點」之外,我還必須過濾要用作維名稱的因子級別。

訣竅可能是在創建矩陣時不會丟失因素。

第二個想法

如果我設置df$weight爲零組我不感興趣,並創建子矩陣

df2 <- df 
df2[df2$group == 2, 4] <- 0 
mat2 <- sparseMatrix(i = as.numeric(df2$from), j = as.numeric(df2$to), x 
= df2$weight, dimnames = list(levels(df$from), levels(df$to))) 

矩陣具有正確的尺寸,我可以輕鬆地隨身攜帶因子水平爲尺寸名稱,但矩陣現在包含零:

4 x 4 sparse Matrix of class "dgCMatrix" 
     Deedee Joey Johnny Tommy 
Deedee  1 1  .  . 
Joey  1 1  .  . 
Johnny  . .  0  0 
Tommy  . .  0  0 

這是als o不是一個選項,因爲行標準化創建了NaN s,當我將矩陣轉換爲圖形並執行網絡分析時,我遇到了麻煩。

在這裏,訣竅可能是從稀疏矩陣中去除零點?但是如何?

在任何情況下,解決方案必須儘可能高效,因爲矩陣變得非常大。

回答

1

基本上你的第一個想法:

mat1 <- sparseMatrix(i = as.numeric(df1$from), j = as.numeric(df1$to), 
        x = df1$weight, 
        dims = c(length(levels(df$from)), length(levels(df$to))), 
        dimnames = list(levels(df$from), levels(df$to))) 

#4 x 4 sparse Matrix of class "dgCMatrix" 
#  Deedee Joey Johnny Tommy 
#Deedee  1 1  .  . 
#Joey  1 1  .  . 
#Johnny  . .  .  . 
#Tommy  . .  .  . 
+0

非常感謝,這是它。 – hyco