我想寫一個bloom filter來存儲約80,000個字符串...現在我猜測每個字符串可以是2個字符的長度。要存儲80,000個字符串..我需要80,000 * 2 = 16kBytes?巨大的bitarray /位圖聲明在C
如果我必須存儲16kB = 16 * 1000 * 8 = 128,000bits,那麼至少需要2^17 = 131,072的位圖。這就是我現在所擁有的
INT主要(){
char *str = "hello world";
int c = sizeof(unsigned char);
/*
* declare the bit array
*/
unsigned char bit_arr[128000/c];
/*
* couple of hash functions
*/
unsigned int bkd = bkdrhash(str, strlen(str));
unsigned int rsh = rshash(str, strlen(str));
unsigned int jsh = jshash(str, strlen(str));
/*
* logic to set bit
* Access the bitmap arr element
* And then set the required bits in that element
*/
bit_arr[bkd/c] & (1 << (bkd%c));
bit_arr[rsh/c] & (1 << (rsh%c));
bit_arr[jsh/c] & (1 << (jsh %c));
}
有沒有做這更好的/最佳方式?
感謝
來自各種顯而易見錯別字
只是說你應該說它是c的語言。 –
謝謝..我編輯了標題和標籤 – user999755
'sizeof(unsigned char)== 1'總是 - 你可能需要'0123_'的'CHAR_BIT'而不是 –