2015-06-16 57 views
4

下面的代碼以獲得樣本數據集:如何在R或Excel中重塑數據框?

set.seed(0) 
practice <- matrix(sample(1:100, 20), ncol = 2) 
data <- as.data.frame(practice) 
data <- cbind(lob = sprintf("objective%d", rep(1:2,each=5)), data) 
data <- cbind(student = sprintf("student%d", rep(1:5,2)), data) 
names(data) <- c("student", "learning objective","attempt", "score") 
data[-8,] 

的數據是這樣的:

student learning objective attempt score 
1 student1   objective1  90  6 
2 student2   objective1  27 19 
3 student3   objective1  37 16 
4 student4   objective1  56 60 
5 student5   objective1  88 34 
6 student1   objective2  20 66 
7 student2   objective2  85 42 
9 student4   objective2  61 82 
10 student5   objective2  58 31 

我要的是:

student  objective1   objective2 
       attempt score  attempt score 
1 student1   90  6   20  66 
2 student2   27 19   85  42 
3 student3   ...    0  0 
4 student4   ...     ... 
5 student5   ...     ... 

有70個學習目標,因此複製和粘貼嘗試和分數將會很繁瑣,所以我想知道是否有更好的方法來清理數據。

R:我試圖用R中的melt函數來獲取新數據,但它不能正常工作。有些學生缺少分數,學生姓名沒有列出,例如student3在這種情況下,所以我不能只是cbind的分數。

Excel中:有70個學習目標,而且由於缺少名字,我要檢查所有70個目標的所有相應的行爲VLOOKUP

(=VLOOKUP($C7,'0learning.csv'!$B$372:$G$395,5,0) 
(=VLOOKUP($C7,'0learning.csv'!$B$372:$G$395,6,0) 

有沒有更好的辦法?

回答

4

我們可以使用data.table的開發版本,即v1.9.5,它可以採用多個value.var列,並將'long'格式重新設置爲'wide'。安裝說明是here

library(data.table)#v1.9.5+ 
names(data)[2] <- 'objective' 
dcast(setDT(data), student~objective, value.var=c('attempt', 'score')) 
# student attempt_objective1 attempt_objective2 score_objective1 
#1: student1     90     20    6 
#2: student2     27     85    19 
#3: student3     37     96    16 
#4: student4     56     61    60 
#5: student5     88     58    34 
# score_objective2 
#1:    66 
#2:    42 
#3:    87 
#4:    82 
#5:    31 

或者用reshapebase R

reshape(data, idvar='student', timevar='objective', direction='wide') 
# student attempt.objective1 score.objective1 attempt.objective2 
# 1 student1     90    6     20 
# 2 student2     27    19     85 
# 3 student3     37    16     96 
# 4 student4     56    60     61 
# 5 student5     88    34     58 
# score.objective2 
# 1    66 
# 2    42 
# 3    87 
# 4    82 
# 5    31 
+0

謝謝,但它似乎有錯誤兩個碼的兩行的。1:>名稱(數據)[2] < - 'objective' 警告消息:... 2:> dcast(setDT(data),student_objective,value.var = c('attempt','score')) .subset2(x,i,精確=精確):下標越界 – SongTianyang

+0

@SongTianyang您使用'data.table'的開發版嗎? – akrun

+0

@SongTianyang我加了一個'base R'版本,如果你沒有devel版本的data.table – akrun