2013-04-15 28 views
-3

它在java.util.HashMap爲什麼modulo等於&爲基數2的值?

/** 
* Returns index for hash code h. 
*/ 
static int indexFor(int h, int length) { 
    return h & (length-1); 
} 

其中長度是基座2

或者Lucene的布隆過濾器代碼(org.apache.lucene.codecs.bloom.FuzzySet)

// Bloom sizes are always base 2 and so can be ANDed for a fast modulo 
int pos = positiveHash & bloomSize; 

是常用它對我來說毫無意義,因爲例如對於8的i % 8i & 8之間的差異不是零!

scala> (0 to 20).map(i => (i & 8) - (i % 8)) 
res33: scala.collection.immutable.IndexedSeq[Int] = Vector(0, -1, -2, -3, -4, -5, -6, -7, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4) 
+2

對於所有非負「i」,正確比較「i&7」和「i%8」,以及「i&7 == i%8」。 –

+0

是的,他們命名'bloomSize'很差,因爲數組是一個更大的,例如'FixedBitSet bits = new FixedBitSet(longs,bloomSize + 1);' –

+0

謝謝,是的,它可以在HashMaps中追蹤,儘管Lucene布隆過濾器是邏輯的仍然令人困惑。 – yura

回答

1

既不HashMap中也不FuzzySet是&「荷蘭國際集團上的2的冪精確 - 他們正在使用的形式2^n - 1的整數。你從FuzzySet引述的評論是在這方面遺憾的是誤導性的,但如果你做一個小挖,你可以找到這個代碼塊:

//The sizes of BitSet used are all numbers that, when expressed in binary form, 
//are all ones. This is to enable fast downsizing from one bitset to another 
//by simply ANDing each set index in one bitset with the size of the target bitset 
// - this provides a fast modulo of the number. Values previously accumulated in 
// a large bitset and then mapped to a smaller set can be looked up using a single 
// AND operation of the query term's hash rather than needing to perform a 2-step 
// translation of the query term that mirrors the stored content's reprojections. 
static final int usableBitSetSizes[]; 
static 
{ 
    usableBitSetSizes=new int[30]; 
    int mask=1; 
    int size=mask; 
    for (int i = 0; i < usableBitSetSizes.length; i++) { 
     size=(size<<1)|mask; 
     usableBitSetSizes[i]=size; 
    }  
} 

在FuzzySet的bitsize變量總是最終從這個數組分配。這裏的評論也描述了到底發生了什麼。

計算X % 8 (1000)是計算X & 7 (0111)。這適用於所有權數爲2.