難隨機化，基於頻率排名

我有這樣一個數據幀：難隨機化，基於頻率排名

X = data.frame（A = C（「D1」，「D1」，「D1」，「D1」，「D1」 D2「，」D3「，」D3「，」D4「，」D4「，」D4「，」D5「，」D5「），B = c（」A1「，」A3「，」A4「「A5」，「A6」，「A5」，「A5」，「A6」，「A6」，「A1」，「A2」，「A5」，「A6」））

A  B 
D1 A1 
D1 A3 
D1 A4 
D1 A5 
D1 A6 
D2 A5 
D3 A5 
D3 A6 
D4 A6 
D4 A1 
D4 A2 
D5 A5 
D5 A6

排序通過列B，列B中的實體具有不同的頻率。

A B freq(B) 
D1 A1 2 
D4 A1 2 
D4 A2 1 
D1 A3 1 
D1 A4 1 
D1 A5 4 
D2 A5 4 
D3 A5 4 
D5 A5 4 
D1 A6 4 
D3 A6 4 
D4 A6 4 
D5 A6 4

我要生成上數據幀x的B列的隨機數據幀，但隨機化只能採取地方條目的頻率是相同的或相似的（+/-一個等級）。 Let'said。現在，A2，A3，A4的頻率爲1，因此A2，A3和A4可以自由地互換，但不能與A5和A6以及A1互換。同樣，由於A5和A6的頻率爲4，它們可以在它們之間隨機化。對於頻率= 2（根據頻率（B））排列的唯一條目A1），由於沒有替換可以發生，所以對A1給予特殊條件。 A1可以隨機地由A2，A3，A4（其排名爲一個等級（1，排名第一，基於freq（B））低於A1）或A5/A6（排名第一等級（4，排名第二，排名第三）頻率（B））高於A1）。

是否有可能被R輕鬆完成？

來源

2011-06-13 a83

你說的隨機是什麼意思？你想從「B」中的每個值中抽樣並返回一行嗎？返回所有這些，但以隨機方式訂購它們？請提供一個示例輸出。 – Chase 2011-06-13 11:16:59

@ a83我會迴應@蔡斯的評論 - 請嘗試解釋你想要做什麼替換。我已經發布了一個答案，我認爲你只需要一個單一的特定組，但請看一看，如果這不符合你的要求，請回復我們。 – 2011-06-13 13:24:40

您對隨機問題的下半部分是有點不清楚，但這裏是一個開始。當你更新你的問題 - 我會相應地更新答案。下面的代碼添加B列的計數信息，然後根據我們添加的頻率列的值對行進行採樣。我認爲從這裏所需要的只是修改哪些色譜柱可用於取樣，但請確認你想要的。

require(plyr) 
x <- merge(x,count(x, "B")) 
ddply(x, "freq", function(x) sample(x))

來源

2011-06-13 11:25:48 Chase

這就是問題的可用性修改。儘管簡潔的代碼+1。 – 2011-06-14 15:08:13

，第一部分是很容易的功能在我permute包（僅限於R-forge的時刻）來處理

require(permute) ## install from R-forge if not available 
x <- data.frame(A = c("D1","D1","D1","D1","D1","D2","D3","D3", 
         "D4","D4","D4","D5","D5"), 
       B = c("A1","A3","A4","A5","A6","A5","A5","A6", 
         "A6","A1","A2","A5","A6")) 
x <- x[order(x$B), ] 
x <- transform(x, freq = rep((lens <- sapply(with(x, split(B, B)), 
          length)), lens)) 
set.seed(529) 
ind <- permuted.index(NROW(x), control = permControl(strata = factor(x$freq)))

其中給出：

R> x[ind, ] 
    A B freq 
10 D4 A1 2 
1 D1 A1 2 
11 D4 A2 1 
2 D1 A3 1 
3 D1 A4 1 
12 D5 A5 4 
4 D1 A5 4 
9 D4 A6 4 
13 D5 A6 4 
5 D1 A6 4 
6 D2 A5 4 
8 D3 A6 4 
7 D3 A5 4 
R> ind 
[1] 2 1 3 4 5 9 6 12 13 10 7 11 8

我們可以換，這是一聲明生成ň排列

ctrl <- permControl(strata = factor(x$freq)) 
n <- 10 
set.seed(83) 
IND <- replicate(n, permuted.index(NROW(x), control = ctrl))

其中給出：

> IND 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 2 2 1 2 1 2 1 2 1  1 
[2,] 1 1 2 1 2 1 2 1 2  2 
[3,] 3 5 4 3 5 5 4 5 5  5 
[4,] 5 3 5 5 3 4 5 4 4  4 
[5,] 4 4 3 4 4 3 3 3 3  3 
[6,] 9 12 11 12 6 10 13 10 8 13 
[7,] 10 11 6 11 13 7 7 12 7  9 
[8,] 8 9 9 10 8 6 11 13 12 10 
[9,] 12 10 8 6 9 13 9 6 9 11 
[10,] 13 6 12 9 7 9 8 8 13  8 
[11,] 6 7 10 13 12 11 6 11 10  7 
[12,] 11 8 13 7 11 8 10 7 6 12 
[13,] 7 13 7 8 10 12 12 9 11  6

現在你也需要做一些專項抽檢。如果我理解正確，你想要確定哪一個頻率級別只包含一個單獨的級別B.然後可能隨機地將B級別的頻率級別替換爲從相鄰頻率級別的B級級別中隨機選擇的B.如果是這樣的話，就更加複雜，得到正確的行來替換了一點，但我認爲下面的功能做的：

randSampleSpecial <- function(x, replace = TRUE) { 
    ## have we got access to permute? 
    stopifnot(require(permute)) 
    ## generate a random permutation within the levels of freq 
    ind <- permuted.index(NROW(x), 
          control = permControl(strata = factor(x$freq))) 
    ## split freq into freq classes 
    ranks <- with(x, split(freq, freq)) 
    ## rank the freq classes 
    Ranked <- rank(as.numeric(names(ranks))) 
    ## split the Bs on basis of freq classes 
    Bs <- with(x, split(B, freq)) 
    ## number of unique Bs in freq class 
    uniq <- sapply(Bs, function(x) length(unique(x))) 
    ## which contain only a single type of B? 
    repl <- which(uniq == 1) 
    ## if there are no freq classes with only one level of B, return 
    if(!(length(repl) > 0)) 
     return(ind) 
    ## if not, continue 
    ## which of the freq classes are adjacent to unique class? 
    other <- which(Ranked %in% (repl + c(1,-1))) 
    ## generate uniform random numbers to decide if we replace 
    Rand <- runif(length(ranks[[repl]])) 
    ## Which are the rows in `x` that we want to change? 
    candidates <- with(x, which(freq == as.numeric(names(uniq[repl])))) 
    ## which are the adjacent values we can replace with 
    replacements <- with(x, which(freq %in% as.numeric(names(uniq[other])))) 
    ## which candidates to replace? Decision is random 
    change <- sample(candidates, sum(Rand > 0.5)) 
    ## if we are changing a candidate, sample from the replacements and 
    ## assign 
    if(length(change) > 0) 
     ind[candidates][change] <- sample(ind[replacements], length(change), 
              replace = replace) 
    ## return 
    ind 
}

要使用此，我們：

R> set.seed(35) 
R> randSampleSpecial(x) 
[1] 2 1 5 3 4 6 9 12 10 11 7 8 13

我們可以在replicate()調用把這個包產生許多這樣的替代品：

R> IND <- replicate(10, randSampleSpecial(x)) 
R> IND 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
[1,] 11 3 6 4 2 1 1 2 10  3 
[2,] 1 11 1 12 11 11 2 1 1 13 
[3,] 4 5 4 3 4 3 4 5 5  4 
[4,] 5 4 5 5 5 4 5 3 3  3 
[5,] 3 3 3 4 3 5 3 4 4  5 
[6,] 11 7 11 12 9 6 7 8 9  9 
[7,] 13 12 12 7 11 7 9 10 8 10 
[8,] 10 8 9 8 12 12 8 6 13  8 
[9,] 7 9 13 10 8 10 13 9 12 11 
[10,] 6 11 10 11 10 13 12 13 10 13 
[11,] 12 10 6 6 6 9 11 12 7 12 
[12,] 9 6 7 9 7 8 10 7 6  7 
[13,] 8 13 8 13 13 11 6 11 11  6

對於這個數據集，我們知道這是行1和2中的排序x，我們可能要替換來自其他freq類的值。如果我們沒有完成替換，則前兩行的值將只有1或2（請參見前面的）。在新的中，前兩行中的值爲而非 a 1或2，我們用其中一個相鄰頻率類中的B代替它。

我的函數假設你想：

只隨意替換相鄰類之一，在同質頻率類元素！如果你想總是替換，那麼我們改變功能來適應。

如果我們正在做替換，那麼替換可以是任何替換，並且如果我們需要多於1個替換，則可以不止一次地選擇相同的替換。在呼叫中設置replace = FALSE以進行無需替換的採樣，如果這是您想要的。

該函數假定您只有一個單個單特性頻率類別。如果應該很容易使用循環遍歷兩個或多個單特定類來修改，但這確實會使函數複雜化，並且由於您對問題的描述不太清楚，我將事情簡單化了。

來源

2011-06-13 13:19:21

排序爲+1，還不知道。 – 2011-06-14 15:13:21

@Gavin給你一個很好的方法，並詢問是否有人可以想出更簡單的方法。下一個功能也是一樣的，僅基於基本功能。它使用count來處理頻率，並且考慮到對於最小en最大頻率，只有一個相鄰秩。加文的功能在這種情況下給出了一個錯誤。

Permdf <- function(x,v){ 
    # some code to allow Permdf(df,var) 
    mc <- match.call() 
    v <- as.quoted(mc$v) 
    y <- unlist(eval.quoted(v,x)) 
    # make bins with values in v per frequency 
    freqs <- count(x,v) 
    bins <- split(freqs[[1]],freqs[[2]]) 
    nbins <- length(bins) 
    # define the output 
    dfid <- 1:nrow(x) 

    for (i in 1:nbins){ 
    # which id's to change 
    id <- which(y %in% bins[[i]]) 

    if(length(bins[[i]]) > 1){ 
     # in case there's more than one value for that frequency 
     dfid[id] <- sample(dfid[id]) 
    } else { 
     bid <- c(i-1,i,i+1) 
     # control wether id in range 
     bid <- bid[bid > 0 & bid <=nbins] 
     # id values to choose from 
     vid <- which(y %in% unlist(bins[bid])) 
     # random selection 
     dfid[id] <- sample(vid,length(id),replace=TRUE) 
    } 
    } 
    #return 
    dfid 
}

這可以作爲

Permdf(x,B)

來源

2011-06-14 15:06:04

難隨機化，基於頻率排名

回答

相關問題