我應該如何使用番石榴的散列＃consistentHash？

我正在研究在我寫的一些java代碼中使用consistent hash算法。番石榴哈希圖書館有一個consistentHash(HashCode, int)方法，但the documentation是相當缺乏。我最初的希望是，我只需使用consistentHash()即可獲得簡單的會話親和力，從而有效地分配一組後端服務器的負載。我應該如何使用番石榴的散列＃consistentHash？

有沒有人有如何使用這種方法的現實世界的例子？特別是我關心的是如何管理從目標範圍中移除一個桶。

例如：

@Test 
public void testConsistentHash() { 
    List<String> servers = Lists.newArrayList("server1", "server2", "server3", "server4", "server5"); 

    int bucket = Hashing.consistentHash(Hashing.md5().hashString("someId"), servers.size()); 
    System.out.println("First time routed to: " + servers.get(bucket)); 

    // one of the back end servers is removed from the (middle of the) pool 
    servers.remove(1); 

    bucket = Hashing.consistentHash(Hashing.md5().hashString("blah"), servers.size()); 
    System.out.println("Second time routed to: " + servers.get(bucket)); 
}

信息到輸出：

 
First time routed to: server4 
Second time routed to: server5

我想是該標識符（「someId」）除去的服務器較早後映射到相同的服務器在列表中。所以在上面的示例中，刪除後我想我想要存儲區0映射到「server1」，存儲區1映射到「server3」，存儲區2映射到「server4」，存儲區3映射到「server5」。

我是否應該維護一個單獨的（比列表更復雜的）數據結構來管理桶的移除和添加？我想我已經設想了一個更復雜的哈希API，它可以在添加和刪除特定桶之後管理重新映射。

注：我知道示例代碼是使用一個小輸入和桶集。我在100個桶中輸入了1000個輸入，並且結果是相同的。當我將buckets更改爲99，並將存儲區99分配到剩餘的99個存儲區時，映射到存儲區0-98的輸入保持不變。

來源

2012-09-07 GamingBuck

您注意的是正確的...但你可以看到，番石榴什麼都不知道關於你的清單，除了它的尺寸，你呢？所以它不能做其他事情。 – maaartinus

我認爲這是您真正想要的文檔鏈接：http://docs.guava-libraries.googlecode.com/git-history/release13/javadoc/com/google/common/hash/Hashing.html#consistentHash%28com。 google.common.hash.HashCode，％20int％29 - 雖然這確實沒有太多，您還有什麼感覺應該說？ –

@Kevin - 文檔可能是O.K.如果在最後增加/刪除的要求再多一個字的話。我發佈了我的問題，因爲我希望我的解釋是錯誤的，並且有一些顯而易見的方式來管理我沒有想到的bucket操作。在開始維基百科條目並閱讀那裏引用的java實現之後，我得到了番石榴方法，所以我想我期望看到更接近這兩篇文章描述的東西（更像克里斯對下面答案中的內容的描述）。 – GamingBuck

恐怕現在的consistentHash沒有任何數據結構可以做到。由於該方法僅接受列表大小，因此只支持從末尾追加和刪除。目前，最好的解決方案通過

server.set(n, servers.get(servers.size() - 1); 
servers.remove(servers.size() - 1);

這樣你那種交換失敗，最後可能服務器替換由

servers.remove(n)

的。這看起來很糟糕，因爲它會使兩臺交換服務器的分配錯誤。這個問題只有其中一個失敗的一半。但是有意義的是，在刪除最後一個列表元素之後，一切都很好，除了分配給發生故障的服務器和之前的最後一個服務器。

因此根據需要改變兩次分配。不是最佳的，但希望有用嗎？

來源

2012-09-07 15:11:48 maaartinus

感謝您的快速回答。這似乎是一個合理的解決方法，直到一個更豐富的API可用。我可能會選擇這種方式，或者根據其他語言實現推出我自己的產品。 – GamingBuck

@GamingBuck：我剛剛創建了[更好的（甚至可能是最佳的）解決方案]（https://dl.dropbox.com/u/4971686/published/maaartin/guava/consistenthash/index.html）。 – maaartinus

我不認爲目前有這樣做的好方法。 consistentHash目前的形式僅適用於簡單情況下 - 基本上，您可以通過旋鈕來增加或減少服務器數量......但總是通過在最後添加和刪除。

有一些工作正在進行中添加類是這樣的：

public final class WeightedConsistentHash<B, I> { 
    /** Initially, all buckets have weight zero. */ 
    public static <B, I> WeightedConsistentHash<B, I> create(
     Funnel<B> bucketFunnel, Funnel<I> inputFunnel); 

    /** 
    * Sets the weight of bucket "bucketId" to "weight". 
    * Requires "weight" >= 0.0. 
    */ 
    public void setBucketWeight(B bucketId, double weight); 

    /** 
    * Returns the bucket id that "input" maps to. 
    * Requires that at least one bucket has a non-zero weight. 
    */ 
    public B hash(I input); 
}

然後你可以這樣寫：

WeightedConsistentHash<String, String> serverChooser = 
    WeightedConsistentHash.create(stringFunnel(), stringFunnel()); 
serverChooser.setBucketWeight("server1", 1); 
serverChooser.setBucketWeight("server2", 1); 
// etc. 

System.out.println("First time routed to: " + serverChooser.hash("someId")); 

// one of the back end servers is removed from the (middle of the) pool 
serverChooser.setBucketWeight("server2", 0); 

System.out.println("Second time routed to: " + serverChooser.hash("someId"));

而每一次你應該得到相同的服務器。該API看起來合適嗎？

來源

2012-09-07 14:34:01

我必須承認，我還沒有仔細查看Funnel API，但乍一看似乎可行。我期待看到它的可用性。 – GamingBuck

關於這方面的任何消息？我可以在[v14]中看到'weightedConsistentHash'參考（http://docs.guava-libraries.googlecode.com/git-history/v14.0.1/javadoc/com/google/common/hash/Hashing.html），但[v16]缺席（http://docs.guava-libraries.googlecode.com/git-history/v16.0.1/javadoc/com/google/common/hash/Hashing.html）。 Colin在提交9acc76ba4中刪除了引用。 – maaartinus

番石榴API沒有任何關於您的服務器列表的知識。它只能保證這一點：

int bucket1 = Hashing.consistentHash(Hashing.md5().hashString("server1"),N);  
int bucket2 = Hashing.consistentHash(Hashing.md5().hashString("server1"),N-1); 

assertThat(bucket1,is(equalTo(bucket2))); iff bucket1==bucket2!=N-1

需要桶manange到你的服務器列表自己

來源

2012-09-09 03:29:26

我應該如何使用番石榴的散列＃consistentHash？

回答

相關問題