2014-03-12 218 views
1

我有一些數據要重塑R但無法弄清楚如何。這裏是情景:我有來自不同學校的許多學生的考試成績數據。下面是一些示例數據:重塑R中的數據矩陣

#Create example data: 
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3)) 

在這樣的數據格式得到的:

score schoolid 
    1  1 
    10  1 
    20  2 
    40  2 
    20  3 

因此,有aschool ID標識的學校,有一個測試得分爲每個學生。對於不同的程序進行分析,我想有這樣一個格式的數據:

   Score student 1 Score student 2 
School ID == 1  1     10    
School ID == 2  10     40 
School ID == 3  20     NA 

重塑數據,我試圖用改造並從reshape2庫投功能,但是這導致錯誤:

#Reshape function 
reshape(test, v.names = test2$score, idvar = test2$schoolid, direction = "wide") 
reshape(test, idvar = test$schoolid, direction = "wide") 
#Error: in [.data.frame'(data,,idvar): undefined columns selected 

#Cast function 
cast(test,test$schoolid~test$score) 
#Error: Error: could not find function "cast" (although ?cast works fine) 

我猜的事實,有考試分數的數量是每所學校不同的重組進程複雜化。

我該如何重塑這些數據以及我應該使用哪種功能?

+0

你必須在data.frame定義學生證。 –

回答

4

這裏有一些解決方案,只使用R的三個解決方案使用這個新studentno變量的基礎:

studentno <- with(test, ave(schoolid, schoolid, FUN = seq_along)) 

1)tapply

with(test, tapply(score, list(schoolid, studentno), c)) 

捐贈:

1 2 
1 1 10 
2 20 40 
3 20 NA 

2)重塑

# rename score to student and append studentno column 
test2 <- transform(test, student = score, score = NULL, studentno = studentno) 
reshape(test2, dir = "wide", idvar = "schoolid", timevar = "studentno") 

,並提供:

schoolid student.1 student.2 
1  1   1  10 
3  2  20  40 
5  3  20  NA 

3)XTABSxtabs如果沒有學生的分數是0也將工作。

xt <- xtabs(score ~ schoolid + studentno, test) 
xt[xt == 0] <- NA # omit this step if its ok to use 0 in place of NA 
xt 

,並提供:

 studentno 
schoolid 1 2 
     1 1 10 
     2 20 40 
     3 20 
2

你必須從某個地方定義的學生ID,例如:

test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3)) 
test$studentid <- c(1,2,1,2,1) 

library(reshape2) 
dcast(test, schoolid ~ studentid, value.var="score",mean) 
    schoolid 1 2 
1  1 1 10 
2  2 20 40 
3  3 20 NaN