2014-09-05 52 views
1

我有一個data.table dists中,看起來像這樣:crossprod通過載體的因素在data.table

Classes ‘data.table’ and 'data.frame': 1800 obs. of 4 variables: 
$ groupname: Factor w/ 8 levels "A","B","C","D",..: 3 3 3 3 3 3 3 3 3 3 ... 
$ start : int 0 60 120 180 240 300 360 420 480 540 ... 
$ V1  : num 1041 955 962 865 944 ... 
$ vN  : num 0.0042 0.00385 0.00388 0.00349 0.00381 ... 
- attr(*, ".internal.selfref")=<externalptr> 

這裏是整個事情dputhttp://pastebin.com/VW54NfUg

我可以做VN的每個crossprod個別因素。例如

crossprod(as.matrix(dists[c(groupname=="C")]$vN), 
      as.matrix(dists[c(groupname=="D")]$vN)) 

但我想要做一次全部和輸出他們,像看起來是這樣的一個矩陣:

  C   D   E   F   G   H 
C 0.000000000            
D 0.003515663 0.000000000        
E 0.003530643 0.003580947 0.000000000   
F 0.003580947 0.003409901 0.003522218 0.000000000   
G 0.003522218 0.003515663 0.003409901 0.003580947 0.000000000 
H 0.003409901 0.003522218 0.003515663 0.003530643 0.003515663 0.000000000 

我有一種感覺,這可能是真的簡單,但我m與data.table和矩陣一起工作。我該怎麼做?

回答

4

基本上,你只是描述了一個矩陣乘法X'X,其中X的列是vN值,每列有一列。您可以使用拆分申請,結合範式來計算X:

# Get rid of stray labels 
dists$groupname <- as.character(dists$groupname) 

# Define X matrix and compute final table 
X <- do.call(cbind, lapply(split(dists, dists$groupname), function(x) x$vN)) 
(cp <- t(X) %*% X) 
#    C   D   E   F   G   H 
# C 0.003495762 0.003515663 0.003530643 0.003580947 0.003522218 0.003409901 
# D 0.003515663 0.003720479 0.003677919 0.003757778 0.003650462 0.003477723 
# E 0.003530643 0.003677919 0.003750939 0.003784916 0.003665951 0.003485093 
# F 0.003580947 0.003757778 0.003784916 0.003994177 0.003775697 0.003526653 
# G 0.003522218 0.003650462 0.003665951 0.003775697 0.003740864 0.003476628 
# H 0.003409901 0.003477723 0.003485093 0.003526653 0.003476628 0.003438210 

如果你想往下0主對角線上,你可以用diag(cp) <- 0完成。

3

正如@josilber指出的,這是簡單的矩陣乘法,你只需要提取矩陣。這是一個更簡單,更快速的提取方式:

setkey(dists, groupname) # making sure it's ordered by groupname 

X = dists[, matrix(vN, ncol = length(unique(groupname)))] 
colnames(X) = unique(dists$groupname) 

crossprod(X, X) 
#   C   D   E   F   G   H 
#C 0.003495762 0.003515663 0.003530643 0.003580947 0.003522218 0.003409901 
#D 0.003515663 0.003720479 0.003677919 0.003757778 0.003650462 0.003477723 
#E 0.003530643 0.003677919 0.003750939 0.003784916 0.003665951 0.003485093 
#F 0.003580947 0.003757778 0.003784916 0.003994177 0.003775697 0.003526653 
#G 0.003522218 0.003650462 0.003665951 0.003775697 0.003740864 0.003476628 
#H 0.003409901 0.003477723 0.003485093 0.003526653 0.003476628 0.003438210