2011-03-13 43 views
17

爲什麼Google sparsehash開源庫有兩個實現:密集的哈希表和稀疏的哈希表?稀疏散列表背後的主要實現思路是什麼?

+0

我想我誤解了帖子中的問題。不會稀疏哈希表+密集哈希表==所有哈希表?如果是這樣,那麼爲什麼圖書館稱爲「sparsehash」? – cHao 2011-03-13 12:22:38

+3

BTW:[來自Google代碼的文檔](http://google-sparsehash.googlecode.com/svn/trunk/doc/implementation.html)。 – cHao 2011-03-13 12:28:24

回答

16

密集的散列表是您的普通教科書散列表實現。

稀疏散列表僅存儲實際設置的元素,並將其劃分爲多個數組。從comments在稀疏表的實施引證:

// The idea is that a table with (logically) t buckets is divided 
// into t/M *groups* of M buckets each. (M is a constant set in 
// GROUP_SIZE for efficiency.) Each group is stored sparsely. 
// Thus, inserting into the table causes some array to grow, which is 
// slow but still constant time. Lookup involves doing a 
// logical-position-to-sparse-position lookup, which is also slow but 
// constant time. The larger M is, the slower these operations are 
// but the less overhead (slightly). 

到知道該陣列的元素被設置,一個稀疏表包括位圖:

// To store the sparse array, we store a bitmap B, where B[i] = 1 iff 
// bucket i is non-empty. Then to look up bucket i we really look up 
// array[# of 1s before i in B]. This is constant time for fixed M. 

,使得每個元件招致的開銷只有1位(在限制內)。

3

sparsehash是將鍵映射到值(每個鍵1-2位)的高效內存方式。布隆過濾器可以爲每個關鍵點提供更少的位數,但它們除了外部/可能位於內部之外不會將值附加到鍵以外,這比稍微少於一點的信息要少。