2016-11-24 41 views
1

我試圖通過數據框的每一行,隨機選擇一半變量,並將該特定行的變量設置爲NAR - 隨機選擇變量並按行對它們進行操作

例如,下面的mydf數據集,我想爲我的第一行隨機選擇3個變量(比如QBQEQF)和第二行重新設定自己的分數來NA,然後(比方說QAQDQE)等:

library(tibble) 
mydf <- tibble(QA = rnorm(100), 
QB = rnorm(100), 
QC = rnorm(100), 
QD = rnorm(100), 
QE = rnorm(100), 
QF = rnorm(100)) 

我的嘗試,但它似乎並沒有做任何事情:

vars <- names(mydf) 
for (i in nrow(mydf)){ 
    miss_vars <- sample(vars, 3) 
    for (j in miss_vars) { 
    mydf[i,j] <- NA 
#mydf[i,][[j]] <- NA #Also tried this. 
    } 
} 

回答

1

試試這個矢量:

m <- as.matrix(mydf) 
n <- 3 # number of randoms to be selected 
inds <- cbind(rep(1:nrow(mydf), each=n), c(replicate(nrow(mydf), sample(ncol(mydf), n)))) 
m[inds] <- NA 
res <- as.data.frame(m) 

方法如下:

  1. 首先採取數據幀的矩陣,從需要量化受益
  2. 定義每行隨機選擇的列數
  3. 生成其中,用於數據幀中的每一行和相應的隨機柱被置於
  4. 設置這些行和cols到NA
  5. 矩陣inds找回該數據幀

res,你將有一個數據幀,其中隨機將3列設置爲每行NA。用於所提供的數據幀的輸出是:

  # QA   QB   QC  QD   QE   QF 
# 1 -0.6264538   NA   NA 1.358680 -0.1645236   NA 
# 2 0.1836433   NA 0.78213630  NA -0.2533617   NA 
# 3   NA   NA 0.07456498  NA 0.6969634 0.3411197 
# 4   NA -2.21469989   NA  NA 0.5566632 -1.1293631 
# 5   NA 1.12493092 0.61982575  NA   NA 1.4330237 
# 6 -0.8204684 -0.04493361   NA  NA   NA 1.9803999 
# 7 0.4874291 -0.01619026   NA -0.394290   NA   NA 
# 8 0.7383247   NA -1.47075238  NA   NA -1.0441346 
# 9   NA 0.82122120   NA 1.100025   NA 0.5697196 
# 10   NA 0.59390132 0.41794156  NA   NA -0.1350546 

數據

set.seed(1) 
mydf <- data.frame(QA = rnorm(10), 
QB = rnorm(10), 
QC = rnorm(10), 
QD = rnorm(10), 
QE = rnorm(10), 
QF = rnorm(10)) 
1

本來應該是:

for (i in seq_len(nrow(mydf))){ 
    miss_vars <- sample(vars, 3) 
    for (j in miss_vars) { 
    mydf[i,][[j]] <- NA 
    } 
} 
相關問題