在不同範圍內在Cuda內核中生成隨機數

我試圖在cuda內核中生成隨機數隨機數。我希望從均勻分佈和整數形式生成隨機數，從1開始到8，隨機數對於每個線程都是不同的。隨機數可以生成的範圍也會因線程而異。一個線程中的最大範圍可能低至2，或者在另一個線程中它可以高達8，但不會高於此範圍。所以，我提供以下的我想如何產生的數字爲例：在不同範圍內在Cuda內核中生成隨機數

In thread#1 --> maximum of the range is 2 and so the random number should be between 1 and 2 
In thread#2 --> maximum of the range is 6 and so the random number should be between 1 and 6 
In thread#3 --> maximum of the range is 5 and so the random number should be between 1 and 5

等等...

任何幫助將是非常讚賞。謝謝。

來源

2013-08-29 duttasankha

編輯：我編輯了我的答案，以解決其他答案（@tudorturcu）和評論中指出的一些缺陷。

使用CURAND以產生uniform distribution 0.0 1.0之間和
然後，通過在所需的範圍乘以這個（最大值 - 最小值+ 0.999999）。
然後添加偏移量（+最小值）。
然後截斷爲一個整數。

像這樣的事情在你的設備代碼：

int idx = threadIdx.x+blockDim.x*blockIdx.x; 
// assume have already set up curand and generated state for each thread... 
// assume ranges vary by thread index 
float myrandf = curand_uniform(&(my_curandstate[idx])); 
myrandf *= (max_rand_int[idx] - min_rand_int[idx] + 0.999999); 
myrandf += min_rand_int[idx]; 
int myrand = (int)truncf(myrandf);

您應該：

#include <math.h>

爲truncf

這裏是一個完全樣例：

$ cat t527.cu 
#include <stdio.h> 
#include <curand.h> 
#include <curand_kernel.h> 
#include <math.h> 
#include <assert.h> 
#define MIN 2 
#define MAX 7 
#define ITER 10000000 

__global__ void setup_kernel(curandState *state){ 

    int idx = threadIdx.x+blockDim.x*blockIdx.x; 
    curand_init(1234, idx, 0, &state[idx]); 
} 

__global__ void generate_kernel(curandState *my_curandstate, const unsigned int n, const unsigned *max_rand_int, const unsigned *min_rand_int, unsigned int *result){ 

    int idx = threadIdx.x + blockDim.x*blockIdx.x; 

    int count = 0; 
    while (count < n){ 
    float myrandf = curand_uniform(my_curandstate+idx); 
    myrandf *= (max_rand_int[idx] - min_rand_int[idx]+0.999999); 
    myrandf += min_rand_int[idx]; 
    int myrand = (int)truncf(myrandf); 

    assert(myrand <= max_rand_int[idx]); 
    assert(myrand >= min_rand_int[idx]); 
    result[myrand-min_rand_int[idx]]++; 
    count++;} 
} 

int main(){ 

    curandState *d_state; 
    cudaMalloc(&d_state, sizeof(curandState)); 
    unsigned *d_result, *h_result; 
    unsigned *d_max_rand_int, *h_max_rand_int, *d_min_rand_int, *h_min_rand_int; 
    cudaMalloc(&d_result, (MAX-MIN+1) * sizeof(unsigned)); 
    h_result = (unsigned *)malloc((MAX-MIN+1)*sizeof(unsigned)); 
    cudaMalloc(&d_max_rand_int, sizeof(unsigned)); 
    h_max_rand_int = (unsigned *)malloc(sizeof(unsigned)); 
    cudaMalloc(&d_min_rand_int, sizeof(unsigned)); 
    h_min_rand_int = (unsigned *)malloc(sizeof(unsigned)); 
    cudaMemset(d_result, 0, (MAX-MIN+1)*sizeof(unsigned)); 
    setup_kernel<<<1,1>>>(d_state); 

    *h_max_rand_int = MAX; 
    *h_min_rand_int = MIN; 
    cudaMemcpy(d_max_rand_int, h_max_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice); 
    cudaMemcpy(d_min_rand_int, h_min_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice); 
    generate_kernel<<<1,1>>>(d_state, ITER, d_max_rand_int, d_min_rand_int, d_result); 
    cudaMemcpy(h_result, d_result, (MAX-MIN+1) * sizeof(unsigned), cudaMemcpyDeviceToHost); 
    printf("Bin: Count: \n"); 
    for (int i = MIN; i <= MAX; i++) 
    printf("%d %d\n", i, h_result[i-MIN]); 

    return 0; 
} 


$ nvcc -arch=sm_20 -o t527 t527.cu -lcurand 
$ cuda-memcheck ./t527 
========= CUDA-MEMCHECK 
Bin: Count: 
2 1665496 
3 1668130 
4 1667644 
5 1667435 
6 1665026 
7 1666269 
========= ERROR SUMMARY: 0 errors 
$

來源

2013-08-29 02:48:02

我可能做了這樣的事情。你可以把它寫成代碼，以便我可以比較兩者。再次感謝。 – duttasankha

@羅伯特的例子並不會完全生成均勻分佈（儘管範圍內的所有數字都會生成，並且所有生成的數字都在該範圍內）。最小值和最大值都是選擇範圍內其餘數字的概率爲0.5。

在第2步，您應該乘以範圍內的值的數量:(最大值 - 最小值+ 0.999999）。 *

在步驟3，偏移應該是（+最小值）而不是（+最小值+0.5）。

步驟1和4保持不變。

*由於@Kamil Czerski指出，1.0版本包含在發行版中。添加1.0而不是0.99999有時會導致數字超出所需的範圍。

來源

2014-07-30 10:07:34 tudorturcu

請注意，curand_uniform中的[是] [包含]（http://docs.nvidia.com/cuda/curand/device-api-overview.html#distributions）。有一個很小的機會，你畫正好1。0和乘以（largest_value - smallest_value + 1）增加（smallest_value）並舍入爲零您越界。[Here]（http://stackoverflow.com/questions/24537112/uniformly-distributed-pseudorandom-integers- inside-cuda-kernel/24537113＃24537113）是我生成統一整數的版本，但其基本思想與Robert的Crovella相同。我使用0.999999而不是1，並且完全像你提議的那樣。 –

謝謝你注意到這個錯誤。我發現在分配中包含1.0的決定並排除了0.0很奇怪。我將修改我的答案以包含您的更改。您可以修改它以包含您的代碼示例。 – tudorturcu

在不同範圍內在Cuda內核中生成隨機數

回答

相關問題