我試圖從數據框中抽取兩個隨機抽取的子樣本,抽取子樣本中列的均值並計算均值之間的差異。下面的功能和使用replicate
內do.call
應儘可能我可以告訴大家的工作,但我不斷收到錯誤消息:如何使用替換引導函數並返回輸出
示例數據:
> dput(a)
structure(list(index = 1:30, val = c(14L, 22L, 1L, 25L, 3L, 34L,
35L, 36L, 24L, 35L, 33L, 31L, 30L, 30L, 29L, 28L, 26L, 12L, 41L,
36L, 32L, 37L, 56L, 34L, 23L, 24L, 28L, 22L, 10L, 19L), id = c(1L,
2L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 16L, 17L, 18L, 19L, 20L, 21L, 21L, 22L, 23L, 24L,
25L)), .Names = c("index", "val", "id"), class = "data.frame", row.names = c(NA,
-30L))
代碼:
# Function to select only one row for each unique id in the data frame,
# take 2 randomly drawn subsets of size 40 from this unique dataset,
# calculate means of both subsets and determine the difference between the two means
extractDiff <- function(P){
xA <- ddply(P, .(id), function(x) {x[sample(nrow(x), 1) ,] }) # selects only one row for each id in the data frame
subA <- xA[sample(xA, 10, replace=TRUE), ] # takes a random sample of 40 rows
subB <- xA[sample(xA, 10, replace=TRUE), ] # takes a second random sample of 40 rows
meanA <- mean(subA$val)
meanB <- mean(subB$val)
diff <- abs(meanA-meanB)
outdf <- c(mA = meanA, mB= meanB, diffAB = diff)
return(outdf)
}
# To repeat the random selections and mean comparison X number of times...
fin <- do.call(rbind, replicate(10, extractDiff(a), simplify=FALSE))
錯誤消息:
Error in xj[i] : invalid subscript type 'list'
我認爲錯誤是一些如果沒有以可以輸入到rbind
的格式返回函數輸出,但我嘗試的任何東西似乎都不起作用。我曾嘗試將outdf對象轉換爲數據框和矩陣,並仍然出現錯誤消息)。
我還在學R,所以對任何幫助都會感激不盡。謝謝!
'ddply'中的匿名函數缺少返回值。 – Roland
@羅蘭德:我不確定我明白你的意思嗎?我將'ddply()'函數的結果稱爲「xA」,並將其傳遞給下一個命令。當然這應該工作?我以這種方式自己嘗試了循環中的ddply函數,它工作正常嗎?請你能給我一個如何改變代碼的例子嗎?非常感謝。 – jjulip
它應該是'xA < - ddply(P,。(id),function(x){x [sample(nrow(x),1),]})'。對不起,我不能進一步幫助,但你的代碼是不可複製的(http://stackoverflow.com/a/5963610/1412059)。 – Roland