在cuda內核中生成隨機數

我在寫一個cuda程序，我需要生成一個隨機變量，這個隨機變量可以按照正態分佈生成。我希望隨機變量的值被限制在0到8之間。所以我想要在內核函數內生成隨機變量，然後將隨機變量結果用於進一步的使用。我打算爲此目的使用cuRAND庫。我一直試圖使用curand_normal設備API來生成值，但沒有任何成功。如果有人能夠爲我提供內核函數代碼，這將非常有幫助。感謝您的全力協助。在cuda內核中生成隨機數

下面提供的代碼是CPU實現的是什麼，我在GPU搜索：

#include "stdafx.h" 
    #include <iostream> 
    #include <random> 

    using namespace std; 
    int _tmain(int argc, _TCHAR* argv[]) 
    { 
     const int nrolls=10000; // number of experiments 
     const int nstars=100; // maximum number of stars to distribute 
     int i; 
     default_random_engine generator; 
     normal_distribution<double> distribution(0.0,3); 


     for (i=0;i<=nstars;i++) 
     { int number = distribution(generator); 
      printf("%d\n\n",number); 
     } 


     return 0; 
    }

有一點我想補充一點，我不知道C++，我只是寫了這個程序遵循我在其他站點看到的其他代碼。謝謝。

來源

2013-01-12 duttasankha

爲什麼你不發佈你嘗試過的方法沒有成功？這通常是一個好主意。「爲我寫代碼」類型的問題不太可能得到好的結果。你看過設備API示例[這裏]（http://docs.nvidia.com/cuda/curand/index.html#topic_1_3_6）？它提供了一個完整的程序，其中一個'generate_uniform_kernel'選項應該非常接近你所要求的。 –

嗨！嗨！感謝您的支持。我實際上瀏覽了設備API示例並複製了幾乎所有的東西，因爲它只是使用XORWOW隨機生成器，但結果並不令人信服，並且只能在小數部分給出。我不知道如何在0到8的範圍內得到一個隨機變量。還有一件事是我嘗試了[在這裏]（http：// stackoverflow）中提到的程序。com/questions/11832202/cuda-random-number-generating），並且我在頂層設備函數中將curand_uniform更改爲curand_normal，我得到了一些結果。 – duttasankha

你是否想要一個離散的均勻分佈，取值爲（0,1,2,3,4,5,6,7,8）（即整數）還是你想要一個連續值的均勻分佈， 0.0和8.0（即浮點數）。 –

這裏是一個改編的this code，它會產生一個近似「正常」分佈的隨機數集，可以採用大約0到8之間的離散值。我不理解請求中的評論有一個範圍0到8，平均值爲0.

#include <stdio.h> 
#include <stdlib.h> 
#include <cuda.h> 
#include <curand_kernel.h> 
#include <math.h> 
#define SCALE 2.0 
#define SHIFT 4.5 
#define DISCRETE 
#define BLOCKS 1024 
#define THREADS 512 

#define CUDA_CALL(x) do { if((x) != cudaSuccess) { \ 
    printf("Error at %s:%d\n",__FILE__,__LINE__); \ 
    return EXIT_FAILURE;}} while(0) 

__global__ void setup_kernel(curandState *state) 
{ 
    int id = threadIdx.x + blockIdx.x * blockDim.x; 
    /* Each thread gets different seed, a different sequence 
     number, no offset */ 
    curand_init(7+id, id, 0, &state[id]); 
} 



__global__ void generate_normal_kernel(curandState *state, 
           int *result) 
{ 
    int id = threadIdx.x + blockIdx.x * blockDim.x; 
    float x; 
    /* Copy state to local memory for efficiency */ 
    curandState localState = state[id]; 
    /* Generate pseudo-random uniforms */ 
    for(int n = 0; n < 10; n++) { 
     x = (curand_normal(&localState) * SCALE)+SHIFT; 
     /* Discretize */ 
#if defined DISCRETE 
     x = truncf(x); 
#endif 
    } 
    /* Copy state back to global memory */ 
    state[id] = localState; 
    /* Store last generated result per thread */ 
    result[id] = (int) x; 
} 


int main(int argc, char *argv[]) 
{ 
    int i; 
    unsigned int total; 
    curandState *devStates; 
    int *devResults, *hostResults; 
    int device; 
    struct cudaDeviceProp properties; 

    CUDA_CALL(cudaGetDevice(&device)); 
    CUDA_CALL(cudaGetDeviceProperties(&properties,device)); 


    /* Allocate space for results on host */ 
    hostResults = (int *)calloc(THREADS * BLOCKS, sizeof(int)); 

    /* Allocate space for results on device */ 
    CUDA_CALL(cudaMalloc((void **)&devResults, BLOCKS * THREADS * 
       sizeof(int))); 
    /* Set results to 0 */ 
    CUDA_CALL(cudaMemset(devResults, 0, THREADS * BLOCKS * 
       sizeof(int))); 

    /* Allocate space for prng states on device */ 
    CUDA_CALL(cudaMalloc((void **)&devStates, THREADS * BLOCKS * 
        sizeof(curandState))); 

    /* Setup prng states */ 
    setup_kernel<<<BLOCKS, THREADS>>>(devStates); 


    /* Generate and use uniform pseudo-random */ 
    generate_normal_kernel<<<BLOCKS, THREADS>>>(devStates, devResults); 

    /* Copy device memory to host */ 
    CUDA_CALL(cudaMemcpy(hostResults, devResults, BLOCKS * THREADS * 
     sizeof(int), cudaMemcpyDeviceToHost)); 

    /* Show result */ 
    if (THREADS*BLOCKS > 20){ 
     printf("First 20 stored results:\n"); 
     for (i=0; i<20; i++) 
     printf("%d\n", hostResults[i]); 
     } 

    total = 0; 
    for(i = 0; i < BLOCKS * THREADS; i++) { 
     total += hostResults[i]; 
    } 
    printf("Results mean = %f\n", (total/(1.0*BLOCKS*THREADS))); 



    /* Cleanup */ 
    CUDA_CALL(cudaFree(devStates)); 
    CUDA_CALL(cudaFree(devResults)); 
    free(hostResults); 
    return EXIT_SUCCESS; 
}

您可以輕鬆修改此代碼以生成連續值正態分佈（浮點數）。

正態分佈的兩個參數是均值和標準差。這些使用SHIFT和SCALE參數表示。 SHIFT從零移動平均值。 SCALE修改標準偏差（從1.0到任何SCALE指示）。所以你可以玩機智的SHIFT和SCALE參數來獲得你想要的發行版。請注意，截斷隨機數生成器的實值輸出會影響統計數據。您可以通過調整SCALE或SHIFT來進行調整，或者您可以從truncf()切換到一些舍入風格。

nvcc -arch=sm_20 -o uniform uniform.cu

假設你有一個CC2.0或更高GPU：

你可以編譯這個。

如果不是這樣，它的確定與編譯：

nvcc -o uniform uniform.cu

這種雙重的被貶到漂浮在這種情況下，編譯器警告是確定以忽略。

THREADS和BLOCKS是機器限制內的任意選擇。您可以修改這些以適應您自己代碼的特定啓動配置。

來源

2013-01-12 07:40:26

在cuda內核中生成隨機數

回答

相關問題