2014-03-04 20 views
0

我有一個數據表,其中包含一列垃圾箱和一列值。數據框在數據框內重複。我想從每個垃圾箱中選擇一個預定數量的值。該預定數量可以通過查找包含一列中的庫號和第二列中的對應的num.to.sample值的參考數據幀中的庫號找到。應使用num.to.sample值在採樣函數中從該bin中選擇值。在垃圾箱中選擇特定數量的值

#Example data 
    data = as.data.frame(cbind(rep(1:3, each=6))) 
    colnames(data) = "bin" 
    data$value = rnorm(18) 

    #Reference file used to determine how many data$values to select based on data$bin 
    ref = as.data.frame(cbind(1:3)) 
    colnames(ref) = "bin" 
    ref$num.to.sample = c(1,2,3) 

    #Sample function 
    #num should be determined by the num.to.sample value that the bin matches to in ref 
    samples = function(x, num){ 
     sample(x, num, replace=FALSE); 
    } 

    #this code below works for selecting a specific number of values by bin 
    #how can this be turned into the num.to.sample value that would result from matching 
    #data$bin to ref$bin and returning ref$num.to.sample? 
    data.sample = data[unlist(tapply(1:nrow(data),data$bin, function(x) samples(x,2))),] 
    data.sample 

任何想法?

謝謝!

回答

1

可能有更好的方法,但在第一遍你可以使用

data <- merge(data, ref) 

library(plyr) 
ddply(data, "bin", function(x) x[sample(1:nrow(x), unique(x$num.to.sample)), ]) 
+0

謝謝,這對於主要場景我有我的數據的偉大工程。有時候,我的num.to.sample會比我爲這個bin所用的行數多,所以它給了我一個錯誤。這不會經常發生,所以我可以手動查看這些內容。再次感謝! – SC2