2014-06-05 55 views
0

我試圖從數據框中抽取兩個隨機抽取的子樣本,抽取子樣本中列的均值並計算均值之間的差異。下面的功能和使用replicatedo.call應儘可能我可以告訴大家的工作,但我不斷收到錯誤消息:如何使用替換引導函數並返回輸出

示例數據:

> dput(a) 
structure(list(index = 1:30, val = c(14L, 22L, 1L, 25L, 3L, 34L, 
35L, 36L, 24L, 35L, 33L, 31L, 30L, 30L, 29L, 28L, 26L, 12L, 41L, 
36L, 32L, 37L, 56L, 34L, 23L, 24L, 28L, 22L, 10L, 19L), id = c(1L, 
2L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
14L, 15L, 16L, 16L, 17L, 18L, 19L, 20L, 21L, 21L, 22L, 23L, 24L, 
25L)), .Names = c("index", "val", "id"), class = "data.frame", row.names = c(NA, 
-30L)) 

代碼:

# Function to select only one row for each unique id in the data frame, 
# take 2 randomly drawn subsets of size 40 from this unique dataset, 
# calculate means of both subsets and determine the difference between the two means 
extractDiff <- function(P){ 
    xA <- ddply(P, .(id), function(x) {x[sample(nrow(x), 1) ,] }) # selects only one row for each id in the data frame 
    subA <- xA[sample(xA, 10, replace=TRUE), ] # takes a random sample of 40 rows 
    subB <- xA[sample(xA, 10, replace=TRUE), ] # takes a second random sample of 40 rows 
    meanA <- mean(subA$val) 
    meanB <- mean(subB$val) 
    diff <- abs(meanA-meanB) 
    outdf <- c(mA = meanA, mB= meanB, diffAB = diff) 
    return(outdf) 
} 

# To repeat the random selections and mean comparison X number of times... 
fin <- do.call(rbind, replicate(10, extractDiff(a), simplify=FALSE)) 

錯誤消息:

Error in xj[i] : invalid subscript type 'list' 

我認爲錯誤是一些如果沒有以可以輸入到rbind的格式返回函數輸出,但我嘗試的任何東西似乎都不起作用。我曾嘗試將outdf對象轉換爲數據框和矩陣,並仍然出現錯誤消息)。

我還在學R,所以對任何幫助都會感激不盡。謝謝!

+0

'ddply'中的匿名函數缺少返回值。 – Roland

+0

@羅蘭德:我不確定我明白你的意思嗎?我將'ddply()'函數的結果稱爲「xA」,並將其傳遞給下一個命令。當然這應該工作?我以這種方式自己嘗試了循環中的ddply函數,它工作正常嗎?請你能給我一個如何改變代碼的例子嗎?非常感謝。 – jjulip

+0

它應該是'xA < - ddply(P,。(id),function(x){x [sample(nrow(x),1),]})'。對不起,我不能進一步幫助,但你的代碼是不可複製的(http://stackoverflow.com/a/5963610/1412059)。 – Roland

回答

0

如果通過sample list/data.frame作爲第一個參數,它將返回一個list/data.frame。您不能使用data.frame來對數據框架進行子集化。

library(plyr) 
extractDiff <- function(P){ 
    xA <- ddply(P, .(id), function(x) {x[sample(nrow(x), 1) ,] }) # selects only one row for each id in the data frame 
    subA <- xA[sample(nrow(xA), 10, replace=TRUE), ] # takes a random sample of 40 rows 
    subB <- xA[sample(nrow(xA), 10, replace=TRUE), ] # takes a second random sample of 40 rows 
    meanA <- mean(subA$val) 
    meanB <- mean(subB$val) 
    diff <- abs(meanA-meanB) 
    outdf <- c(mA = meanA, mB= meanB, diffAB = diff) 
    return(outdf) 
} 

set.seed(42) 
fin <- do.call(rbind, replicate(10, extractDiff(a), simplify=FALSE)) 
#   mA mB diffAB 
# [1,] 29.4 25.5 3.9 
# [2,] 25.8 23.0 2.8 
# [3,] 25.3 29.5 4.2 
# [4,] 29.0 31.2 2.2 
# [5,] 26.5 25.6 0.9 
# [6,] 26.8 27.2 0.4 
# [7,] 28.7 27.3 1.4 
# [8,] 22.7 28.7 6.0 
# [9,] 30.6 23.2 7.4 
# [10,] 25.1 25.2 0.1 
+0

@ Roland:謝謝!我只是想念'nrow()'。非常感激。 – jjulip