我的目標是測試一個類是否將其某個屬性設置爲隨機整數值。我在網上發現了一個卡方檢測算法,並決定將其投入使用。我對結果感到非常驚訝:樣本量越大,測試似乎越不可能通過。我應該說,我絕不是一名統計專家(我可能會問這個問題可能是不言而喻的),所以我可能會在這裏弄錯一些問題。單元測試隨機數java
只在最終的變量int SIZE
(在UserTest
)變化的測試結果。每個測試運行30次:
SIZE avg results
11 25.4 26, 25, 22, 24, 30
20 25 26, 26, 24, 22, 27
30 24 24, 22, 24, 26, 24
100 19.4 17, 23, 20, 18, 19
200 16.2 15, 18, 18, 15, 15
1000 13.2 13, 13, 14, 13, 13
10000 10 14, 7, 8, 10, 11
雖然這不是絕對必要的,讓我有在這方面真正的隨機性,我還是好奇,是什麼問題。這是一個錯誤的算法本身,我錯誤地使用它,「使測試變得更加困難」的自然結果(統計noob,記住),還是我推動對Java的僞隨機發生器的邊界?
域類:
public class User
{
public static final int MINIT = 20;
public static final int MAXIT = 50;
private int iterations;
public void setIterations()
{
Random random = new Random();
setIterations(MINIT+random.nextInt(MAXIT-MINIT));
}
private void setIterations(int iterations) {
this.iterations = iterations;
}
}
測試類:
public class UserTest {
private User user = new User();
@Test
public void testRandomNumbers() {
int results = 0;
final int TIMES = 30;
for(int i = 0; i < TIMES; i++)
{
if (randomNumbersRun())
{
results++;
}
}
System.out.println(results);
Assert.assertTrue(results >= TIMES * 80/100);
}
private boolean randomNumbersRun()
{
ArrayList<Integer> list = new ArrayList<Integer>();
int r = User.MAXIT - User.MINIT;
final int SIZE = 11;
for (int i = 0; i < r*SIZE; i++) {
user.setIterations();
list.add(user.getIterations());
}
return Statistics.isRandom(list, r);
}
}
卡方算法:
/**
* source: http://en.wikibooks.org/wiki/Algorithm_Implementation/Pseudorandom_Numbers/Chi-Square_Test
* changed parameter to ArrayList<Number> for generalization
*/
public static boolean isRandom(ArrayList<? extends Number> randomNums, int r) {
//According to Sedgewick: "This is valid if N is greater than about 10r"
if (randomNums.size() <= 10 * r) {
return false;
}
//PART A: Get frequency of randoms
Map<Number, Integer> ht = getFrequencies(randomNums);
//PART B: Calculate chi-square - this approach is in Sedgewick
double n_r = (double) randomNums.size()/r;
double chiSquare = 0;
for (int v : ht.values()) {
double f = v - n_r;
chiSquare += f * f;
}
chiSquare /= n_r;
//PART C: According to Swdgewick: "The statistic should be within 2(r)^1/2 of r
//This is valid if N is greater than about 10r"
return Math.abs(chiSquare - r) <= 2 * Math.sqrt(r);
}
/**
* @param nums an array of integers
* @return a Map, key being the number and value its frequency
*/
private static Map<Number, Integer> getFrequencies(ArrayList<? extends Number> nums) {
Map<Number, Integer> freqs = new HashMap<Number, Integer>();
for (Number x : nums) {
if (freqs.containsKey(x)) {
freqs.put(x, freqs.get(x) + 1);
} else {
freqs.put(x, 1);
}
}
return freqs;
}
}
在ideone上運行代碼時,我無法重現您的結果。打印的號碼用'TIMES'([link](http://ideone.com/afo86g))表示。你能檢查我是否缺少任何東西嗎?這很可能取決於平臺。 – dasblinkenlight
您應該在randomNumbersRun()中變量SIZE,而不是testRandomNumbers()中的TIMES。 – blagae
好的,現在我得到的數字大致相同,無論大小如何([link](http://ideone.com/afo86g))。你可以在你的平臺上運行一個小實驗嗎?看看我的代碼鏈接,並通過使隨機隨機靜態(取消註釋行16,並刪除我的代碼中的第20行)更改您的用戶類。看看它是否改變了結果。如果是這樣,我想我知道可能會發生什麼。 – dasblinkenlight