如何從列表中選擇具有非一致概率的值？

我正在查看k-means++初始化算法。該算法的以下兩個步驟產生非均勻概率：如何從列表中選擇具有非一致概率的值？

對於每一個數據點x，計算d（x）和已經被選擇的最近的中心x之間的距離。

隨機選擇一個新數據點作爲新的中心，使用加權的概率分佈，其中點x的選擇概率爲，與D（x）^ 2成比例。

如何在C++中選擇帶有這種聲明的加權概率分佈？

來源

2011-12-19 zebra

「x」是標量還是矢量？ – 2011-12-19 22:09:25

請嘗試在這裏：http://stats.stackexchange.com/或在這裏：http://www.physicsforums.com/forumdisplay.php？f = 78 – 2011-12-19 22:09:57

x是一個2d點，抱歉。 – zebra 2011-12-19 22:14:34

用一組有限的各個數據點X，這要求一個離散概率分佈。

做到這一點，最簡單的方法是枚舉點X的順序，並計算陣列代表其累積概率分佈函數：（僞代碼如下）

/* 
* xset is an array of points X, 
* cdf is a preallocated array of the same size 
*/ 
function prepare_cdf(X[] xset, float[] cdf) 
{ 
    float S = 0; 
    int N = sizeof(xset); 
    for i = 0:N-1 
    { 
     float weight = /* calculate D(xset[i])^2 here */ 
     // create cumulative sums and write to the element in cdf array 
     S += weight; 
     cdf[i] = S; 
    } 

    // now normalize so the CDF runs from 0 to 1 
    for i = 0:N-1 
    { 
     cdf[i] /= S; 
    } 
} 

function select_point(X[] xset, float[] cdf, Randomizer r) 
{ 
    // generate a random floating point number from a 
    // uniform distribution from 0 to 1 
    float p = r.nextFloatUniformPDF(); 
    int i = binarySearch(cdf, p); 
    // find the lowest index i such that p < cdf[i] 

    return xset[i]; 
}

你調用一次prepare_cdf，然後調用select_point儘可能多地生成隨機點。

來源

2011-12-19 22:28:39

我會採取以下措施：

遍歷數據點，儲存在double distance_squareds[]或std::vector<double> distance_squareds或諸如此類的東西他們d平方的，並存儲在double sum_distance_squareds他們d平方的總和。
使用the drand48 function在[0.0，1.0）中選擇一個隨機數，並乘以sum_distance_squareds;將結果存儲在random_number中。
迭代distance_squareds，將這些值相加（再次），一旦運行總數達到或超過random_number，返回與您剛添加的D平方對應的數據點。
由於四捨五入錯誤，您很可能無法返回而完成循環;如果是這樣，只返回第一個數據點，或最後一個，或其他。（不過不要擔心，這應該是一個非常罕見邊緣情況。）

來源

2011-12-19 22:23:24 ruakh

離散分佈在C++ 11與random標題和使用std::discrete_distribution很容易做。這是例子：

#include <iostream> 
#include <map> 
#include <random> 

int main() 
{ 
    std::random_device rd; 
    std::mt19937 gen(rd()); 
    std::discrete_distribution<> d({20,30,40,10}); 
    std::map<int, int> m; 
    for(int n=0; n<10000; ++n) { 
     ++m[d(gen)]; 
    } 
    for(auto p : m) { 
     std::cout << p.first << " generated " << p.second << " times\n"; 
    } 
}

，這是輸出的一個示例：

0 generated 2003 times 
1 generated 3014 times 
2 generated 4021 times 
3 generated 962 times

來源

2013-11-27 23:12:39

在這裏，您使用（數字..）與給定的概率分佈數組有什麼事情，可以幫助你，（概率..）它會爲你（數字）產生這些概率（這裏它會計算它們）。

#include <iostream> 
#include <cmath> 
#include <time.h> 
#include <stdlib.h> 
#include <map> 
#include <vector> 
using namespace std; 
#define ARRAY_SIZE(array) (sizeof(array)/sizeof(array[0])) 

int checkDistribution(double random, const map<double, vector<int> > &distribution_map) 
{ 
    int index = 0; 
    map<double, vector<int> >::const_iterator it = distribution_map.begin(); 
    for (; it!=distribution_map.end(); ++it) 
    { 
     if (random < (*it).first) 
     { 
       int randomInternal = 0; 
       if ((*it).second.size() > 1) 
        randomInternal = rand() % ((*it).second.size()); 
       index = (*it).second.at(randomInternal); 
       break; 
     } 
    } 
    return index; 
} 

void nextNum(int* results, const map<double, vector<int> > &distribution_map) 
{ 
    double random = (double) rand()/RAND_MAX; 
    int index = checkDistribution(random,distribution_map); 
    results[index]+=1; 
} 

int main() { 

    srand (time(NULL)); 
    int results [] = {0,0,0,0,0}; 
    int numbers [] = {-1,0,1,2,3}; 
    double prob [] = {0.01, 0.3, 0.58, 0.1, 0.01}; 
    int size = ARRAY_SIZE(numbers); 
    // Building Distribution 
    map<double, vector<int> > distribution_map; 
    map<double, vector<int> >::iterator it; 
    for (int i = 0; i < size; i++) 
    { 
     it = distribution_map.find(prob[i]); 
     if (it!=distribution_map.end()) 
      it->second.push_back(i); 
     else 
     { 
      vector<int> vec; 
      vec.push_back(i); 
      distribution_map[prob[i]] = vec; 
     } 
    } 
    // PDF to CDF transform 
    map<double, vector<int> > cumulative_distribution_map; 
    map<double, vector<int> >::iterator iter_cumulative; 
    double cumulative_distribution = 0.0; 
    for (it=distribution_map.begin();it!=distribution_map.end();++it) 
    { 
     cumulative_distribution += ((*it).second.size() * (*it).first); 
     cumulative_distribution_map[cumulative_distribution] = (*it).second; 
    } 

    for (int i = 0; i<100; i++) 
    { 
     nextNum(results, cumulative_distribution_map); 
    } 
    for (int j = 0; j<size; j++) 
     cout<<" "<<results[j]<<" "; 
    return 0; 
}

來源

2015-08-07 14:07:32

如何從列表中選擇具有非一致概率的值？

回答

相關問題