2015-06-04 58 views
2

我的數據幀包括與不同的人,在它參與,以及在哪些項目進行了一年的項目關係矩陣。創建有R

我該如何創建一個nxn關係矩陣(n代表個體數量)來計算每個人之間的協作次數。

考慮下面的例子再現所需的結構:

# Example dataframe 
set.seed(1) 
tp=cbind(paste(rep("project",10),1:10,sep=""),sample(2005:2010,10,replace=T)) 
tp=tp[sample(1:10,50,T),] 
id=sample(paste(rep("id",10),1:10,sep=""),50,T) 
df=as.data.frame(cbind(tp,id));rm(tp,id) 
names(df)=c("project","year","id") 
df=df[order(df$project,df$id),] 

df[1:10,] 
# project year id 
# project1 2006 id1 
# project1 2006 id3 
# project1 2006 id5 
# project1 2006 id5 
# project4 2006 id3 
# project4 2006 id4 
# project5 2006 id3 
# project5 2006 id4 
# project6 2008 id2 
# project6 2008 id3 

舉個例子,2006年的關係矩陣是這樣的

id1 id2 id3 id4 id5 
id1 0 0 1 0 1 
id2 0 0 0 0 0 
id3 1 0 0 2 1 
id4 0 0 2 0 0 
id5 1 0 1 0 0 

# link between 1 and 3, 1 and 5, 3 and 5 on project 1 
# links between 3 and 4 on project 4 and project 5 
# the matrix is symmetric 
# the diagonal is O because an individual cannot collaborate with himself 
+0

我加你的函數每一年的數據集的,但它並沒有得到預期的n×n矩陣relatinal'SPL =分割(DF,DF $年);淨= lapply(SPL,函數(X){reshape2 :: acast (X,項目ID〜,value.var = 「ID」)}); net' – goclem

回答

2

我改變你的採樣代碼一點點,使項目維度不同於id維度,因爲我正在玩矩陣的維度,以確保我得到正確的n x n矩陣。這裏的代碼工作:

set.seed(1) 
tp=cbind(paste(rep("project",5),1:5,sep=""),sample(2008:2010,5,replace=T)) 
tp=tp[sample(1:5,20,T),] 
id=sample(paste(rep("id",10),1:10,sep=""),20,T) 
df=as.data.frame(cbind(tp,id));rm(tp,id) 
names(df)=c("project","year","id") 
df=df[order(df$project,df$id),] 

spl=split(df,df$year) 
net=lapply(spl,function(x){ 
    m = table(x$id, x$project) 
    res = tcrossprod(m) ## equivalently: res = m %*% t(m) 
    diag(res) <- 0 
    res <- ifelse(res > 0, 1, 0) 
    res 
}) 
net 

拆分數據:

$`2008` 
    project year id 
5 project1 2008 id4 
7 project1 2008 id6 
19 project1 2008 id6 
2 project5 2008 id1 
13 project5 2008 id2 
1 project5 2008 id4 
16 project5 2008 id9 

$`2009` 
    project year id 
9 project2 2009 id2 
6 project2 2009 id5 
20 project2 2009 id6 
17 project2 2009 id7 
14 project2 2009 id8 
11 project3 2009 id7 

$`2010` 
    project year id 
3 project4 2010 id4 
8 project4 2010 id5 
15 project4 2010 id5 
12 project4 2010 id8 
18 project4 2010 id8 
4 project4 2010 id9 
10 project4 2010 id9 

鄰接矩陣由項目每年可爲:

$`2008` 

     id1 id2 id4 id5 id6 id7 id8 id9 
    id1 0 1 1 0 0 0 0 1 
    id2 1 0 1 0 0 0 0 1 
    id4 1 1 0 0 1 0 0 1 
    id5 0 0 0 0 0 0 0 0 
    id6 0 0 1 0 0 0 0 0 
    id7 0 0 0 0 0 0 0 0 
    id8 0 0 0 0 0 0 0 0 
    id9 1 1 1 0 0 0 0 0 

$`2009` 

     id1 id2 id4 id5 id6 id7 id8 id9 
    id1 0 0 0 0 0 0 0 0 
    id2 0 0 0 1 1 1 1 0 
    id4 0 0 0 0 0 0 0 0 
    id5 0 1 0 0 1 1 1 0 
    id6 0 1 0 1 0 1 1 0 
    id7 0 1 0 1 1 0 1 0 
    id8 0 1 0 1 1 1 0 0 
    id9 0 0 0 0 0 0 0 0 

$`2010` 

     id1 id2 id4 id5 id6 id7 id8 id9 
    id1 0 0 0 0 0 0 0 0 
    id2 0 0 0 0 0 0 0 0 
    id4 0 0 0 1 0 0 1 1 
    id5 0 0 1 0 0 0 1 1 
    id6 0 0 0 0 0 0 0 0 
    id7 0 0 0 0 0 0 0 0 
    id8 0 0 1 1 0 0 0 1 
    id9 0 0 1 1 0 0 1 0 
1

您還可以使用dplyr與tidyr此:

library(dplyr) 
library(tidyr) 

df %>% 
    unique %>% 
    mutate(val = 1) %>% 
    spread(id, val) %>% 
    select(-project) %>% 
    group_by(year) %>% 
    do({ 
    mat <- select(., -year) %>% as.matrix 
    mat[is.na(mat)] <- 0 
    cp <- crossprod(mat) 
    diag(cp) <- 0 
    cp %>% as.data.frame %>% 
     add_rownames(var = 'id') 
    }) %>% 
    ungroup