2011-12-22 49 views
2

我有一個恆定數量的樣本,每個樣本都有一個概率。現在我想從這個數據源中重新取樣,以獲得新樣本的相同數量的,每個樣本具有相同的概率。加權隨機抽樣與替換的高效算法

例如:

          random | 0.03 | 0.78 | 0.45 | 0.70 
              -------+------+------+------+------ 
sample | 0000 | 0001 | 0002 | 0003 RNG sample | 0000 | 0003 | 0002 | 0003 
-------+------+------+------+------ ====> -------+------+------+------+------ 
prob. | 0.10 | 0.20 | 0.30 | 0.40   prob. | 0.25 | 0.25 | 0.25 | 0.25 

在我的情況,概率不會直接,但作爲權給出。然而,概率可以直接從權重中導出,因爲所有權重的總和是已知的(但不是常數)。

在MATLAB實現中,我使用的統計工具箱的randsample功能來實現這一重採樣過程:

y = randsample(n,k,true,w)y = randsample(population,k,true,w)返回與更換截取的加權樣品,使用正權w,其的一個矢量長度爲n。對於y的條目選擇整數i的概率是w(i)/sum(w)。通常,w是一個概率向量。 randsample不支持沒有替換的加權抽樣。

function [samples probabilities] = resample(samples, probabilities) 
    sampleCount = size(samples, 1); 
    indices = randsample(1 : samplecount, samplecount, 
         true, probabilities); 
    samples = samples(indices, :); 
    probabilities = repmat(1/sample count, samplecount, 1); 
end 

我現在要端口的算法的這部分到一個iPad 2,其中它被用於更新的實時(〜25fps的)數據,其中512個樣本被重新採樣。因此,時間效率至關重要,其他計算也將進行。內存不必被最小化。

我查看了the Alias method,但是看起來結構構建過程相當繁瑣,可能不是最有效的解決方案。

是否有其他有效的方法可以滿足實時性要求,或者是否是別名方法?

+0

resample如何這些樣品中的許多人,你需要一個單一的框架?實際採樣的集合的大小是多少? – 2011-12-22 16:02:25

+0

約512個樣本。輸入和輸出集都具有相同的大小。 – Etan 2011-12-22 16:36:59

回答

1

這是如何實現的例子您在C.

typedef int SampleType; 
typedef double ProbabilityType; 

static ProbabilityType MyRandomFunction(ProbabilityType total) 
{ 
    static boolean_t isRandomReady = 0; 
    if (! isRandomReady) { 
     srandomdev(); 
     isRandomReady = 1; 
    } 

    long randomMax = INT_MAX; 
    return (random() % (randomMax + 1)) * (total/randomMax); 
} 

static void MyResampleFunction(SampleType *samples, ProbabilityType *probabilities, size_t length) 
{ 
    ProbabilityType total = 0; 

    // first, replace probabilities with sums 
    for (size_t i = 0; i < length; i++) 
     probabilities[i] = total += probabilities[i]; 

    // create a copy of samples as samples will be modified 
    SampleType *sampleCopies = malloc(sizeof(SampleType) * length); 
    memcpy(sampleCopies, samples, sizeof(SampleType) * length); 

    for (size_t i = 0; i < length; i++) 
    { 
     ProbabilityType probability = MyRandomFunction(total); 

     // We could iterate through the probablities array but binary search is more efficient 

     // This is a block declaration 
     int (^comparator)(const void *, const void *); 

     // Blocks are the same a function pointers 
     // execept they capture their enclosing scope 
     comparator = ^(const void *leftPtr, const void *rightPtr) { 

      // leftPtr points to probability 
      // rightPtr to an element in probabilities 

      ProbabilityType curr, prev; 
      size_t idx = ((const ProbabilityType *) rightPtr) - probabilities; 
      curr = probabilities[idx];     // current probablity 
      prev = idx > 0 ? probabilities[idx - 1] : 0; // previous probablity 

      if (curr < probability) 
       return 1; 
      if (prev > probability) 
       return -1; 

      return 0; 
     }; 

     void *found = bsearch_b(&probability,   // the searched value 
           probabilities,   // the searched array 
           length,     // the length of array 
           sizeof(ProbabilityType), // the size of values 
           comparator);    // the comparator 

     size_t idx = ((const ProbabilityType *) found) - probabilities; 
     samples[i] = sampleCopies[idx]; 
    } 

    // now, probabilities are all the same 
    for (size_t i = 0; i < length; i++) 
     probabilities[i] = 1.0/length; 

    // Now the can dispose of the copies 
    free(sampleCopies); 
} 

static void MyTestFunction() 
{ 
    SampleType samples[4] = {0, 1, 2, 3}; 
    ProbabilityType probabilities[10] = {0.1, 0.2, 0.3, 0.4}; 
    MyResampleFunction(samples, probabilities, 4); 
}