按順序對NA值左右的矢量部分進行重新排序

我有一大組數據，我想用12個小組重新排序，使用R中的sample（）函數生成隨機數據集，用它可以執行排列測試。但是，這些數據具有無法收集數據的NA字符，並且我希望在數據混洗時他們保持在各自的原始位置。按順序對NA值左右的矢量部分進行重新排序

針對先前的問題，我已成功地洗牌周圍的NA值的數據爲24個值與單個代碼向量的幫助：

example.data <- c(0.33, 0.12, NA, 0.25, 0.47, 0.83, 0.90, 0.64, NA, NA, 1.00, 0.42) 

    example.data[!is.na(example.data)] <- sample(example.data[!is.na(example.data)], replace = F, prob = NULL) 

[1] 0.64 0.83 NA 0.33 0.47 0.90 0.25 0.12 NA NA 0.42 1.00

從這個擴展，如果我有一組數據長度爲24我將如何重新排序第一組和第二組12個值作爲循環中的個別情況？

例如，從第一實例延伸的矢量：

example.data <- c(0.33, 0.12, NA, 0.25, 0.47, 0.83, 0.90, 0.64, NA, NA, 1.00, 0.42, 0.73, NA, 0.56, 0.12, 1.0, 0.47, NA, 0.62, NA, 0.98, NA, 0.05)

凡example.data[1:12]和example.data[13:24]分開它們各自的組內繞它們NA值混洗。

的代碼，我努力工作，這個解決方案爲如下：

shuffle.data = function(input.data,nr,ns){ 
simdata <- input.data 
    for(i in 1:nr){ 
    start.row <- (ns*(i-1))+1 
    end.row <- start.row + actual.length[i] - 1 
    newdata = sample(input.data[start.row:end.row], size=actual.length[i], replace=F) 
    simdata[start.row:end.row] <- newdata 
     } 
return(simdata)}

哪裏input.data是原始輸入數據（example.data）; nr是組數（2），ns是每個樣本的大小（12）; actual.length是存儲在向量中的每個組的排除NAs的長度（對於上述示例，爲actual.length <- c(9, 8)）。

有人會知道如何去做到這一點？

再次感謝您的幫助！

來源

2017-08-01 Roald

放入一個數據幀，添加另一列，指示分組（像'C（代表（ '一個'，12），代表（」 b'，12））'），使用'dplyr :: group_by'或'data.table'來操作每組數據。或者與基地'分裂'和'lapply'。只需編寫一個適用於一個組的功能並將其應用於所有組。 – Gregor

我同意Gregor的評論，認爲以另一種形式處理數據可能是一種更好的方法。但是，即使所有數據都在一個向量中，您仍需要完成的任務仍可輕鬆完成。

首先使該混洗整個向量的唯一的非NA值的函數：

shuffle_real <- function(data){ 
    # Sample from only the non-NA values, 
    # and store the result only in indices of non-NA values 
    data[!is.na(data)] <- sample(data[!is.na(data)]) 
    # Then return the shuffled data 
    return(data) 
}

現在寫一個函數，在一個較大的載體，和在載體中該函數適用於每個組：

shuffle_groups <- function(data, groupsize){ 
    # It will be convenient to store the length of the data vector 
    N <- length(data) 
    # Do a sanity check to make sure there's a match between N and groupsize 
    if (N %% groupsize != 0) { 
    stop('The length of the data is not a multiple of the group size.', 
     call.=FALSE) 
    } 
    # Get the index of every first element of a new group 
    starts <- seq(from=1, to=N, by=groupsize) 
    # and for every segment of the data of group 'groupsize', 
    # apply shuffle_real to it; 
    # note the use of c() -- otherwise a matrix would be returned, 
    # where each column is one group of length 'groupsize' 
    # (which I note because that may be more convenient) 
    return(c(sapply(starts, function(x) shuffle_real(data[x:(x+groupsize-1)])))) 
}

例如，

example.data <- c(0.33, 0.12, NA, 0.25, 0.47, 0.83, 0.90, 0.64, NA, NA, 1.00, 
        0.42, 0.73, NA, 0.56, 0.12, 1.0, 0.47, NA, 0.62, NA, 0.98, 
        NA, 0.05) 

set.seed(1234) 

shuffle_groups(example.data, 12)

這導致

> shuffle_groups(example.data, 12) 
[1] 0.12 0.83 NA 1.00 0.47 0.64 0.25 0.33 NA NA 0.90 0.42 0.47 NA 
[15] 0.05 1.00 0.56 0.62 NA 0.73 NA 0.98 NA 0.12

或嘗試shuffle_groups(example.data[1:23], 12)，這導致Error: The length of the data is not a multiple of the group size.

來源

2017-08-01 01:15:38 duckmayr

謝謝你@Gregor和duckmayr的建議，他們的工作很完美。我將矢量作爲試驗數據集;我的大數據集已經是數據框。這有一列組標識符，所以任何建議都可以奏效。我嘗試了duckmayr提供的功能，他們做到了這一點。一切正如我所希望的那樣工作，再次感謝！ – Roald

@羅爾德太棒了！聽到那個消息很開心。既然解決了，請繼續並花時間接受答案。謝謝！ – duckmayr

按順序對NA值左右的矢量部分進行重新排序

回答

相關問題