我在做這些隨機數字有什麼問題？

我已經被告知rand（）mod n產生有偏見的結果，所以我試着讓這段代碼去檢查它。它會生成從1到l的s數字，並按照出現次數排序。我在做這些隨機數字有什麼問題？

#include <iostream> 
#include <random> 

using namespace std; 

struct vec_struct{ 
    int num; 
    int count; 
    double ratio; 
}; 

void num_sort(vec_struct v[], int n){ 
    for (int i = 0; i < n-1; i++){ 
     for (int k = 0; k < n-1-i; k++){ 
      if (v[k].num > v[k+1].num) swap(v[k], v[k+1]); 
     } 
    } 
} 

void count_sort(vec_struct v[], int n){ 
    for (int i = 0; i < n-1; i++){ 
     for (int k = 0; k < n-1-i; k++){ 
      if (v[k].count < v[k+1].count) swap(v[k], v[k+1]); 
     } 
    } 
} 

int main(){ 

    srand(time(0)); 

    random_device rnd; 

    int s, l, b, c = 1; 

    cout << "How many numbers to generate? "; 
    cin >> s; 

    cout << "Generate " << s << " numbers ranging from 1 to? "; 
    cin >> l; 

    cout << "Use rand or mt19937? [1/2] "; 
    cin >> b; 

    vec_struct * vec = new vec_struct[s]; 

    mt19937 engine(rnd()); 
    uniform_int_distribution <int> dist(1, l); 

    if (b == 1){ 
     for (int i = 0; i < s; i++){ 
      vec[i].num = (rand() % l) + 1; 
     } 
    } else if (b == 2){ 
     for (int i = 0; i < s; i++){ 
      vec[i].num = dist(engine); 
     } 
    } 
    num_sort(vec, s); 

    for (int i = 0, j = 0; i < s; i++){ 
     if (vec[i].num == vec[i+1].num){ 
      c++; 
     } else { 
      vec[j].num = vec[i].num; 
      vec[j].count = c; 
      vec[j].ratio = ((double)c/s)*100; 
      j++; 
      c = 1; 
     } 
    } 
    count_sort(vec, l); 

    if (l >= 20){ 

     cout << endl << "Showing the 10 most common numbers" << endl; 
     for (int i = 0; i < 10; i++){ 
      cout << vec[i].num << "\t" << vec[i].count << "\t" << vec[i].ratio << "%" << endl; 
     } 

     cout << endl << "Showing the 10 least common numbers" << endl; 
     for (int i = l-10; i < l; i++){ 
      cout << vec[i].num << "\t" << vec[i].count << "\t" << vec[i].ratio << "%" << endl; 
     } 
    } else { 

     for (int i = 0; i < l; i++){ 
      cout << vec[i].num << "\t" << vec[i].count << "\t" << vec[i].ratio << "%" << endl; 
     } 
    } 
}

運行此代碼後，我可以從RAND（）現貨預期偏差：

$ ./rnd_test 
How many numbers to generate? 10000 
Generate 10000 numbers ranging from 1 to? 50 
Use rand or mt19937? [1/2] 1 

Showing the 10 most common numbers 
17 230 2.3% 
32 227 2.27% 
26 225 2.25% 
25 222 2.22% 
3 221 2.21% 
10 220 2.2% 
35 218 2.18% 
5 217 2.17% 
13 215 2.15% 
12 213 2.13% 

Showing the 10 least common numbers 
40 187 1.87% 
7 186 1.86% 
39 185 1.85% 
42 184 1.84% 
43 184 1.84% 
34 182 1.82% 
21 175 1.75% 
22 175 1.75% 
18 173 1.73% 
44 164 1.64%

胡佛我得到幾乎與mt19937和uniform_int_distribution相同的結果！這裏有什麼問題？不應該是統一的，或者測試是無用的？

來源

2016-11-08 Matteo

嘗試採取高階位代替。那些通常分佈更好。即'（rand_num - rand_num％n）>> log2（n）' – StoryTeller

你被告知誰？在什麼平臺和什麼運行時間？通常沒有關於rand（）分佈和質量的保證 –

@OlegBogdanov他與'uniform_int_distribution'和'mt19937'比較 – Danh

不，它不應該是完全一致的。因此上述不是任何錯誤的證據。

它們是隨機的，因此它應該是相當一致的，但不完全一樣。

特別是你會希望每個數字出現大約10000/50 = 200次 - 粗略地說，sqrt（200）的標準偏差約爲14--而對於50個數字，你會期望大約有2個標準差的差異 - 這是+ -/28。

使用模RAND_MAX引起的偏差小於該值;所以你需要更多的樣本來檢測偏差。

來源

2016-11-08 08:34:22

-1

據我可以從 http://www.cplusplus.com/reference/random/mersenne_twister_engine/ mt19937告訴將從相同的偏置遭受蘭特（）

偏置是由於蘭特（）在一定範圍內[0-MAX_RAND]，產生一個無符號的整數，當你把它做小的數字略微更有可能的模量（除非你的除數是MAX_RAND的整數除數）

考慮：

Range [0-74]: 
0 % 50 = 0 
40 % 50 = 40 
50 % 50 = 0 
74 % 50 = 24 
(numbers less than 25 occur twice)

來源

2016-11-08 08:35:23 Stefan

直接使用twister_engine會遇到類似的問題，但通過uniform_int_distribution間接使用它可以避免這個問題。（而且我沒有讓你失望。） –

你必須使用更多的樣本進行這樣的隨機數的測試。我用你的代碼試了50000，結果是：

要生成多少個數字？ 50000

生成範圍從1到？的50000個數字。 50

使用rand還是mt19937？ [1/2] 2

顯示的10倍最常見的數字

36 1054 2.108％

14 1051 2.102％

11 1048 2.096％

27 1045 2.09％

2 1044 2.088％

33 1035 2.07％

21 1034 2.068％

48 1034 2.068％

34 1030 2。06％

39 1030 2.06％

顯示的10個最不常見的數字

47 966 1.932％

16 961 1.922％

38 960 1.92％

28 959 1.918％

8 958 1.916％

10 958 1.916％

30 958 1.916％

32 958 1.916％

18 953 1.906％

23 953 1.906％

來源

2016-11-08 10:29:43 Trantor

我在做這些隨機數字有什麼問題？

回答

相關問題