2013-12-13 24 views
3

這裏是我的數據:比賽和計數值順序按組中的R

group <- c(1,1,1,1,2,2,2,3,3,4,4,4,4) 
X1 <- c("A","A","A","A","B","A","B","A","A","B","B","B","B") 
X2 <- c("A","A","A","A","B","B","B","A","A","B","B","A","A") 
X3 <- c("B","A","A","A","B","B","B","B","B","B","B","B","B") 
X4 <- c("A","A","A","B","B","B","A","A","A","B","A","B","B") 
X5 <- c("A","A","A","A","B","B","B","A","A","A","B","B","B") 
X6 <- c("A","A","A","A","B","A","B","A","A","B","B","A","A") 
mydf <- data.frame (group, X1, X2, X3, X4, X5, X6) 

這樣的數據是:

group X1 X2 X3 X4 X5 X6 
1  1 A A B A A A 
2  1 A A A A A A 
3  1 A A A A A A 
4  1 A A A B A A 
5  2 B B B B B B 
6  2 A B B B B A 
7  2 B B B A B B 
8  3 A A B A A A 
9  3 A A B A A A 
10  4 B B B B A B 
11  4 B B B A B B 
12  4 B A B B B A 
13  4 B A B B B A 

現在我需要的第一行與組行的其餘部分比較。

group X1 X2 X3 X4 X5 X6 
1  1 A A B A A A 
2  1 A A A A A A 
      TRUE TRUE FALSE TRUE TRUE TRUE 

這裏的不匹配只在X3。 1總分6 = 1/6 = 17%

類似地與第一組比較3 1.

group X1 X2 X3 X4 X5 X6 
1  1 A A B A A A 
3  1 A A A A A A 

錯配= 17%

還與第一組比較4 1.

group X1 X2 X3 X4 X5 X6 
1  1 A A B A A A 
4  1 A A A B A A 

錯配= 2/6 = 34%

類似地,對於組2(組的第一行到IE 5與6)

 group X1 X2 X3 X4 X5 X6 
5  2 B B B B B B 
6  2 A B B B B A 

錯配= 2/6 = 34%

類似地:

  group X1 X2 X3 X4 X5 X6 
    5  2 B B B B B B 
    7  2 B B B A B B 

錯配= 1/6 = 17%

我的試驗:

match (mydf[1,], mydf[2,]) 
match (mydf[1,], mydf[3,]) 
+2

您能否給出您的確切預期輸出,包括數據結構? – flodel

+0

同一組中的每一行都獲得相同的分數嗎? – josliber

+0

@josilber第一行與第二行比較併產生不匹配百分比,則第一行與第三行比較併產生不匹配,依此類推。這個想法是每組中的第一行作爲模板 – rdorlearn

回答

6

試試這個:

match_ratio <- function(x) 
    cbind(x, match_ratio = rowMeans(mapply(`==`, x[1, -1], x[, -1]))) 
library(plyr) 
ddply(mydf, "group", match_ratio) 

# group X1 X2 X3 X4 X5 X6 match_ratio 
# 1  1 A A B A A A 1.0000000 
# 2  1 A A A A A A 0.8333333 
# 3  1 A A A A A A 0.8333333 
# 4  1 A A A B A A 0.6666667 
# 5  2 B B B B B B 1.0000000 
# 6  2 A B B B B A 0.6666667 
# 7  2 B B B A B B 0.8333333 
# 8  3 A A B A A A 1.0000000 
# 9  3 A A B A A A 1.0000000 
# 10  4 B B B B A B 1.0000000 
# 11  4 B B B A B B 0.6666667 
# 12  4 B A B B B A 0.5000000 
# 13  4 B A B B B A 0.5000000 
+2

不錯! 'ddply'功能強大。我的解決方案更加生硬。 – hatmatrix

2
## generate pairs of row numbers 
rows <- sequence(nrow(mydf)) 
grid <- subset(expand.grid(Var1=rows,Var2=rows),Var1 > Var2) 

## define some functions 
comparison1 <- function(a,b,x) 
    match(x[a,-1],x[b,-1]) 

comparison2 <- function(a,b,x) 
    x[a,-1]==x[b,-1] 

## apply (comparison1 or comparison2) 
matches <- t(mapply(comparison1,grid$Var2,grid$Var1,MoreArgs=list(x=mydf))) 
dimnames(matches) <- list(paste(grid$Var2,grid$Var1,sep=","), 
          names(mydf)[-1]) 

如果使用comparison1

> head(matches) 
    X1 X2 X3 X4 X5 X6 
1,2 1 1 NA 1 1 1 
1,3 1 1 NA 1 1 1 
1,4 1 1 4 1 1 1 
1,5 NA NA 1 NA NA NA 
1,6 1 1 2 1 1 1 
1,7 4 4 1 4 4 4 

如果使用comparison2

> head(matches) 
     X1 X2 X3 X4 X5 X6 
1,2 TRUE TRUE FALSE TRUE TRUE TRUE 
1,3 TRUE TRUE FALSE TRUE TRUE TRUE 
1,4 TRUE TRUE FALSE FALSE TRUE TRUE 
1,5 FALSE FALSE TRUE FALSE FALSE FALSE 
1,6 TRUE FALSE TRUE FALSE FALSE TRUE 
1,7 FALSE FALSE TRUE TRUE FALSE FALSE 

行名稱對應於對你是比較行號。