基於回答值重新編碼順序命名的變量

我正在努力使用lapply來簡潔地重新編碼值。基於回答值重新編碼順序命名的變量

假設我有10個調查問題，每個問題4個答案，其中總是有一個正確或錯誤的答案。問題標記爲q_1到q_10，我的數據幀被稱爲df。我想用相同的順序標籤創建新的變量，只是將問題編碼爲「正確」（1）或「錯誤」（0）。

如果我要做出正確答案的列表，這將是：

right_answers<-c(1,2,3,4,2,3,4,1,2,4)

於是，我試着寫在使用簡單地重新編碼的所有變量進入新的變量函數同一順序的標識符，如

lapply(1:10, function(fx) { 
    df$know_[fx]<-ifelse(df$q_[fx]==right_answers[fx],1,0) 
})

在一個假想的宇宙，這個代碼爲遠程正確的，我得到的結果使得：

id q_1 know_1 q_2 know_2 
1 1  1  2  1 
2 4  0  3  0 
3 3  0  2  1 
4 4  0  1  0

非常感謝您的幫助！

來源

2013-10-18 roody

這應該給你的每個答案是否是正確的矩陣：

t(apply(test[,grep("q_", names(test))], 1, function(X) X==right_answers))

來源

2013-10-18 22:55:45

你很可能具有與該代碼df$q_[fx]這一部分的麻煩。您可以使用paste來調用列名稱。如：

df = read.table(text = " 
id q_1 q_2 
1 1    2  
2 4    3  
3 3    2  
4 4    1", header = TRUE) 

right_answers = c(1,2,3,4,2,3,4,1,2,4) 

dat2 = sapply(1:2, function(fx) { 
      ifelse(df[paste("q",fx,sep = "_")]==right_answers[fx], 
         1,0) 
})

這不會爲您的data.frame添加列，而是會像@ SenorO的答案那樣創建一個新矩陣。您可以命名矩陣中的列，然後將它們添加到原始data.frame中，如下所示。

colnames(dat2) = paste("know", 1:2, sep = "_") 

data.frame(df, dat2)

來源

2013-10-18 23:02:21 aosmith

出於同樣的矩陣輸出作爲其他的答案，我會建議：

q_names <- paste0("q_", seq_along(right_answers)) 
answers <- df[q_names] 
correct <- mapply(`==`, answers, right_answers)

來源

2013-10-18 23:25:52 flodel

我想提出一個不同的方法來你的問題，使用reshape2包。在我看來，這具有以下優點：1）更多的慣用R（值得），2）更易讀的代碼，3）更少的錯誤傾向，特別是如果你想在將來添加分析。在這種方法中，所有事情都是在數據框內完成的，我認爲儘可能保留所有值（在本例中爲id），並更容易使用R工具的強大功能。

# Creating a dataframe with the form you describe 
df <- data.frame(id=c('1','2','3','4'), q_1 = c(1,4,3,4), q_2 = c(2,3,2,1), q_3 = rep(1,  4), q_4 = rep(2, 4), q_5 = rep(3, 4), 
      q_6 = rep(4,4), q_7 = c(1,4,3,4), q_8 = c(2,3,2,1), q_9 = rep(1, 4), q_10 =  rep(2, 4)) 

right_answers<-c(1,2,3,4,2,3,4,1,2,4) 

# Associating the right answers explicitly with the corresponding question labels in a data frame 
answer_df <- data.frame(questions=paste('q', 1:10, sep='_'), right_answers) 

library(reshape2) 

# "Melting" the dataframe from "wide" to "long" form -- now questions labels are in variable values rather than in column names 
melt_df <- melt(df) # melt function is from reshape2 package 

# Now merging the correct answers into the data frame containing the observed answers 
merge_df <- merge(melt_df, answer_df, by.x='variable', by.y='questions') 

# At this point comparing the observed to correct answers is trivial (using as.numeric to  convert from logical to 0/1 as you request, though keeping as TRUE/FALSE may be clearer) 
merge_df$correct <- as.numeric(merge_df$value==merge_df$right_answers) 

# If desireable (not sure it is), put back into "wide" dataframe form 
cast_obs_df <- dcast(merge_df, id ~ variable, value.var='value') # dcast function is from reshape2 package 
cast_cor_df <- dcast(merge_df, id ~ variable, value.var='correct') 
names(cast_cor_df) <- gsub('q_', 'know_', names(cast_cor_df)) 
final_df <- merge(cast_obs_df, cast_cor_df)

新的tidyr包在這裏可能比reshape2更好。

來源

2014-08-04 20:44:18 eamcvey

基於回答值重新編碼順序命名的變量

回答

相關問題