如何使用替換引導函數並返回輸出

我試圖從數據框中抽取兩個隨機抽取的子樣本，抽取子樣本中列的均值並計算均值之間的差異。下面的功能和使用replicate內do.call應儘可能我可以告訴大家的工作，但我不斷收到錯誤消息：如何使用替換引導函數並返回輸出

示例數據：

> dput(a) 
structure(list(index = 1:30, val = c(14L, 22L, 1L, 25L, 3L, 34L, 
35L, 36L, 24L, 35L, 33L, 31L, 30L, 30L, 29L, 28L, 26L, 12L, 41L, 
36L, 32L, 37L, 56L, 34L, 23L, 24L, 28L, 22L, 10L, 19L), id = c(1L, 
2L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
14L, 15L, 16L, 16L, 17L, 18L, 19L, 20L, 21L, 21L, 22L, 23L, 24L, 
25L)), .Names = c("index", "val", "id"), class = "data.frame", row.names = c(NA, 
-30L))

代碼：

# Function to select only one row for each unique id in the data frame, 
# take 2 randomly drawn subsets of size 40 from this unique dataset, 
# calculate means of both subsets and determine the difference between the two means 
extractDiff <- function(P){ 
    xA <- ddply(P, .(id), function(x) {x[sample(nrow(x), 1) ,] }) # selects only one row for each id in the data frame 
    subA <- xA[sample(xA, 10, replace=TRUE), ] # takes a random sample of 40 rows 
    subB <- xA[sample(xA, 10, replace=TRUE), ] # takes a second random sample of 40 rows 
    meanA <- mean(subA$val) 
    meanB <- mean(subB$val) 
    diff <- abs(meanA-meanB) 
    outdf <- c(mA = meanA, mB= meanB, diffAB = diff) 
    return(outdf) 
} 

# To repeat the random selections and mean comparison X number of times... 
fin <- do.call(rbind, replicate(10, extractDiff(a), simplify=FALSE))

錯誤消息：

Error in xj[i] : invalid subscript type 'list'

我認爲錯誤是一些如果沒有以可以輸入到rbind的格式返回函數輸出，但我嘗試的任何東西似乎都不起作用。我曾嘗試將outdf對象轉換爲數據框和矩陣，並仍然出現錯誤消息）。

我還在學R，所以對任何幫助都會感激不盡。謝謝！

來源

2014-06-05 jjulip

'ddply'中的匿名函數缺少返回值。 – Roland

@羅蘭德：我不確定我明白你的意思嗎？我將'ddply（）'函數的結果稱爲「xA」，並將其傳遞給下一個命令。當然這應該工作？我以這種方式自己嘗試了循環中的ddply函數，它工作正常嗎？請你能給我一個如何改變代碼的例子嗎？非常感謝。 – jjulip

它應該是'xA < - ddply（P，。（id），function（x）{x [sample（nrow（x），1），]}）'。對不起，我不能進一步幫助，但你的代碼是不可複製的（http://stackoverflow.com/a/5963610/1412059）。 – Roland

如果通過sample list/data.frame作爲第一個參數，它將返回一個list/data.frame。您不能使用data.frame來對數據框架進行子集化。

library(plyr) 
extractDiff <- function(P){ 
    xA <- ddply(P, .(id), function(x) {x[sample(nrow(x), 1) ,] }) # selects only one row for each id in the data frame 
    subA <- xA[sample(nrow(xA), 10, replace=TRUE), ] # takes a random sample of 40 rows 
    subB <- xA[sample(nrow(xA), 10, replace=TRUE), ] # takes a second random sample of 40 rows 
    meanA <- mean(subA$val) 
    meanB <- mean(subB$val) 
    diff <- abs(meanA-meanB) 
    outdf <- c(mA = meanA, mB= meanB, diffAB = diff) 
    return(outdf) 
} 

set.seed(42) 
fin <- do.call(rbind, replicate(10, extractDiff(a), simplify=FALSE)) 
#   mA mB diffAB 
# [1,] 29.4 25.5 3.9 
# [2,] 25.8 23.0 2.8 
# [3,] 25.3 29.5 4.2 
# [4,] 29.0 31.2 2.2 
# [5,] 26.5 25.6 0.9 
# [6,] 26.8 27.2 0.4 
# [7,] 28.7 27.3 1.4 
# [8,] 22.7 28.7 6.0 
# [9,] 30.6 23.2 7.4 
# [10,] 25.1 25.2 0.1

來源

2014-06-05 15:18:22 Roland

@ Roland：謝謝！我只是想念'nrow（）'。非常感激。 – jjulip

如何使用替換引導函數並返回輸出

回答

相關問題