2016-05-12 17 views
2

我有一個數據框看起來像下面;我用R將兩列轉移到一個矩陣,但是R不能給我矩陣。 (我的預期矩陣大約是700 * 700。)R停止並顯示Reached total allocation of 12213Mb: see help(memory.size)如何從R和SAS中的兩對列得到相關矩陣?對角線爲零

我想在SAS中做同樣的事情。我們怎麼做到這一點?或者我需要不同的代碼來完成R?

ID_r ID_c SCORE 
A1 A2 0.2 
A1 A3 0.2 
A1 A4 0.3 
A1 A5 0.2 
A1 A6 0.2 
A2 A3 0.6 
A2 A4 0.2 
A2 A5 0.2 
A2 A6 0.2 
A3 A4 0.2 
A3 A5 0.2 
A3 A6 0.2 
A4 A5 0.2 
A4 A6 0.9 
A5 A6 0.2 

    ID_r<-c('A1','A1','A1','A1','A1','A2','A2','A2','A2','A3','A3','A3','A4','A4','A5') 
    ID_c<-c('A2','A3','A4','A5','A6','A3','A4','A5','A6','A4','A5','A6','A5','A6','A6') 
    SCORE<-c(0.2,0.2,0.3,0.2,0.2,0.6,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.9,0.2) 

library(dplyr); library(tidyr) 
df$ID_r <- as.character(df$ID_r) 
df$ID_c <- as.character(df$ID_c) 
ID <- unique(c(df$ID_r, df$ID_c)) 
diagDf <- data.frame(ID_r = ID, ID_c = ID, SCORE = "0.0") 
newDf <- rbind(df, diagDf) %>% arrange(ID_r, ID_c) 

resultDf <- spread(newDf, ID_r, SCORE, fill = ".") 
names(resultDf)[1] <- "" 
resultDf 

樣本SAS數據如下。

data score_data; 
infile datalines; 
input ID_r $ ID_c $ SCORE; 
return; 
datalines; 

    A1 A2 0.2 
    A1 A3 0.2 
    A1 A4 0.3 
    A1 A5 0.2 
    A1 A6 0.2 
    A2 A3 0.6 
    A2 A4 0.2 
    A2 A5 0.2 
    A2 A6 0.2 
    A3 A4 0.2 
    A3 A5 0.2 
    A3 A6 0.2 
    A4 A5 0.2 
    A4 A6 0.9 
    A5 A6 0.2 
; 
run; 

proc print data=score_data ; 
run; 

而且我想用兩列數據生成如下矩陣(diaginal爲零)。

A1 A2 A3 A4 A5 A6 
A1 0.0 0.2 0.2 0.3 0.2 0.2 
A2 0.2 0.0 0.6 0.2 0.2 0.2 
A3 0.2 0.6 0.0 0.2 0.2 0.2 
A4 0.3 0.2 0.2 0.0 0.2 0.9 
A5 0.2 0.2 0.2 0.2 0.0 0.2 
A6 0.2 0.2 0.2 0.9 0.2 0.0 

在此先感謝!

回答

2

R解決方案:

library(plyr) 
ID_r = c('A1','A1','A1','A1','A1','A2','A2','A2','A2','A3','A3','A3','A4','A4','A5') 
ID_c = c('A2','A3','A4','A5','A6','A3','A4','A5','A6','A4','A5','A6','A5','A6','A6') 
SCORE = c(0.2,0.2,0.3,0.2,0.2,0.6,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.9,0.2) 
df1 = data.frame(ID_r, ID_c, SCORE) 
df2 = data.frame(ID_c, ID_r, SCORE) 
names(df2) = c("ID_r","ID_c","SCORE") 
df = rbind(df1,df2) 
ID <- unique(c(ID_r, ID_c)) 

df1 = expand.grid(ID,ID) 
names(df1) = c("ID_r","ID_c") 
d = join(df1, df, by = c("ID_r","ID_c")) 
d$SCORE[is.na(d$SCORE)] <- 0 

a = matrix(0, nrow = length(ID), ncol = length(ID)) 
rownames(a) <- ID 
colnames(a) <- ID 
a 

b = as.matrix(d) 
b 

a[b[,1:2]] <- b[,3] 
a 
+0

'join'需要'plyr'包。 – Divi

+0

默認'join'使用'left'類型。這是你在這個問題中需要的。你得到什麼錯誤? – Divi

+0

我編輯了答案。 – Divi

1

PROC TRANSPOSE是你的朋友在這裏。

proc transpose data=score_data out=score_matrix; 
    by id_r; 
    id id_c; *this makes variable names; 
    var score; 
run; 

這會給你更高的對角線。第二個proc transpose可以給你更低的對角線(交換id_rid_c我想象),或者你可以在數據集中做到這一點。您仍然必須在數據集中創建六個0.0行,但這不應該特別困難。

這樣的一個例子:

data pre_transpose; 
    set score_data end=eof; 
    by id_r id_c; 
    output; 

    *Swap R and C; 
    _idtemp = id_r; 
    id_r=id_c; 
    id_c=_idtemp; 
    output; 

    *If EOF, then need that last 0,0 combo which never gets an R; 
    if eof then do; 
    id_c = id_r; 
    score=0; 
    output; 
    id_c = _idtemp; 
    end; 

    *If first line of a new ID, then need the R=C row; 
    if first.id_r then do; 
    id_r=id_c; 
    score=0; 
    output; 
    end; 

run; 

proc sort data=pre_transpose; 
    by id_r id_c; 
run; 
proc transpose data=pre_transpose out=score_matrix; 
    by id_r; 
    id id_c; *this makes variable names; 
    var score; 
run; 
+0

謝謝!!!!它完美的工作!非常感謝。我從你的答案中學到了很多SAS編碼。 –