,第一部分是很容易的功能在我permute
包(僅限於R-forge的時刻)來處理
require(permute) ## install from R-forge if not available
x <- data.frame(A = c("D1","D1","D1","D1","D1","D2","D3","D3",
"D4","D4","D4","D5","D5"),
B = c("A1","A3","A4","A5","A6","A5","A5","A6",
"A6","A1","A2","A5","A6"))
x <- x[order(x$B), ]
x <- transform(x, freq = rep((lens <- sapply(with(x, split(B, B)),
length)), lens))
set.seed(529)
ind <- permuted.index(NROW(x), control = permControl(strata = factor(x$freq)))
其中給出:
R> x[ind, ]
A B freq
10 D4 A1 2
1 D1 A1 2
11 D4 A2 1
2 D1 A3 1
3 D1 A4 1
12 D5 A5 4
4 D1 A5 4
9 D4 A6 4
13 D5 A6 4
5 D1 A6 4
6 D2 A5 4
8 D3 A6 4
7 D3 A5 4
R> ind
[1] 2 1 3 4 5 9 6 12 13 10 7 11 8
我們可以換,這是一聲明生成ň排列
ctrl <- permControl(strata = factor(x$freq))
n <- 10
set.seed(83)
IND <- replicate(n, permuted.index(NROW(x), control = ctrl))
其中給出:
> IND
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 2 2 1 2 1 2 1 2 1 1
[2,] 1 1 2 1 2 1 2 1 2 2
[3,] 3 5 4 3 5 5 4 5 5 5
[4,] 5 3 5 5 3 4 5 4 4 4
[5,] 4 4 3 4 4 3 3 3 3 3
[6,] 9 12 11 12 6 10 13 10 8 13
[7,] 10 11 6 11 13 7 7 12 7 9
[8,] 8 9 9 10 8 6 11 13 12 10
[9,] 12 10 8 6 9 13 9 6 9 11
[10,] 13 6 12 9 7 9 8 8 13 8
[11,] 6 7 10 13 12 11 6 11 10 7
[12,] 11 8 13 7 11 8 10 7 6 12
[13,] 7 13 7 8 10 12 12 9 11 6
現在你也需要做一些專項抽檢。如果我理解正確,你想要確定哪一個頻率級別只包含一個單獨的級別B.然後可能隨機地將B級別的頻率級別替換爲從相鄰頻率級別的B級級別中隨機選擇的B.如果是這樣的話,就更加複雜,得到正確的行來替換了一點,但我認爲下面的功能做的:
randSampleSpecial <- function(x, replace = TRUE) {
## have we got access to permute?
stopifnot(require(permute))
## generate a random permutation within the levels of freq
ind <- permuted.index(NROW(x),
control = permControl(strata = factor(x$freq)))
## split freq into freq classes
ranks <- with(x, split(freq, freq))
## rank the freq classes
Ranked <- rank(as.numeric(names(ranks)))
## split the Bs on basis of freq classes
Bs <- with(x, split(B, freq))
## number of unique Bs in freq class
uniq <- sapply(Bs, function(x) length(unique(x)))
## which contain only a single type of B?
repl <- which(uniq == 1)
## if there are no freq classes with only one level of B, return
if(!(length(repl) > 0))
return(ind)
## if not, continue
## which of the freq classes are adjacent to unique class?
other <- which(Ranked %in% (repl + c(1,-1)))
## generate uniform random numbers to decide if we replace
Rand <- runif(length(ranks[[repl]]))
## Which are the rows in `x` that we want to change?
candidates <- with(x, which(freq == as.numeric(names(uniq[repl]))))
## which are the adjacent values we can replace with
replacements <- with(x, which(freq %in% as.numeric(names(uniq[other]))))
## which candidates to replace? Decision is random
change <- sample(candidates, sum(Rand > 0.5))
## if we are changing a candidate, sample from the replacements and
## assign
if(length(change) > 0)
ind[candidates][change] <- sample(ind[replacements], length(change),
replace = replace)
## return
ind
}
要使用此,我們:
R> set.seed(35)
R> randSampleSpecial(x)
[1] 2 1 5 3 4 6 9 12 10 11 7 8 13
我們可以在replicate()
調用把這個包產生許多這樣的替代品:
R> IND <- replicate(10, randSampleSpecial(x))
R> IND
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 11 3 6 4 2 1 1 2 10 3
[2,] 1 11 1 12 11 11 2 1 1 13
[3,] 4 5 4 3 4 3 4 5 5 4
[4,] 5 4 5 5 5 4 5 3 3 3
[5,] 3 3 3 4 3 5 3 4 4 5
[6,] 11 7 11 12 9 6 7 8 9 9
[7,] 13 12 12 7 11 7 9 10 8 10
[8,] 10 8 9 8 12 12 8 6 13 8
[9,] 7 9 13 10 8 10 13 9 12 11
[10,] 6 11 10 11 10 13 12 13 10 13
[11,] 12 10 6 6 6 9 11 12 7 12
[12,] 9 6 7 9 7 8 10 7 6 7
[13,] 8 13 8 13 13 11 6 11 11 6
對於這個數據集,我們知道這是行1和2中的排序x
,我們可能要替換來自其他freq類的值。如果我們沒有完成替換,則前兩行的值將只有1
或2
(請參見前面的)。在新的中,前兩行中的值爲而非 a 1
或2
,我們用其中一個相鄰頻率類中的B代替它。
我的函數假設你想:
- 只隨意替換相鄰類之一,在同質頻率類元素!如果你想總是替換,那麼我們改變功能來適應。
- 如果我們正在做替換,那麼替換可以是任何替換,並且如果我們需要多於1個替換,則可以不止一次地選擇相同的替換。在呼叫中設置
replace = FALSE
以進行無需替換的採樣,如果這是您想要的。 - 該函數假定您只有一個單個單特性頻率類別。如果應該很容易使用循環遍歷兩個或多個單特定類來修改,但這確實會使函數複雜化,並且由於您對問題的描述不太清楚,我將事情簡單化了。
你說的隨機是什麼意思?你想從「B」中的每個值中抽樣並返回一行嗎?返回所有這些,但以隨機方式訂購它們?請提供一個示例輸出。 – Chase 2011-06-13 11:16:59
@ a83我會迴應@蔡斯的評論 - 請嘗試解釋你想要做什麼替換。我已經發布了一個答案,我認爲你只需要一個單一的特定組,但請看一看,如果這不符合你的要求,請回復我們。 – 2011-06-13 13:24:40