哈希映射快速插入但檢索速度慢

我正在使用谷歌的哈希映射的實現 google :: dense_hash_map。哈希映射快速插入但檢索速度慢

礦是一個集羣應用程序。所以我必須在成對的集羣之間存儲距離。每個羣集都有一個長整型的羣集ID。所以密鑰必須是（long int id1，long int id2）;

所以我決定我需要一個哈希映射裏面的哈希映射爲此工作。

這是我距離存儲散列映射結構：

google::dense_hash_map<long int, google::dense_hash_map<long int, double> > distanceHash;

這是插入一段距離的哈希地圖，檢索

template<class Point> 
void CoverTree<Point>:: insertDistance(long int id1, long int id2, long double distance) 
{ 

    //Always id1 < id2; 
    if(id1 < id2) 
    { 
    long temp = id1; 
    id1 = id2; 
    id2 = temp; 
    } 


    if(distanceHash.find(id1) == distanceHash.end()) 
    { 
    google::dense_hash_map<long int, double> insideHash; 
    insideHash.set_empty_key(-9999 ); 
    insideHash[id2] = distance; 
    distanceHash[id1] = insideHash; 
    } 
    else 
    { 
    (distanceHash[id1])[id2] = (distanceHash[id1])[id2]; 
    } 
} 

template<class Point> 
double CoverTree<Point>::getStoredDistance(long int id1, long int id2) 
{ 
    if(id1 < id2) 
    { 
    long temp = id1; 
    id1 = id2; 
    id2 = temp; 
    } 

    google::dense_hash_map<long int, double>::iterator it; 

    if(distanceHash.find(id1) != distanceHash.end()) 
    { 

    if(distanceHash[id1].find(id2) != distanceHash[id1].end()) 
     return distanceHash[id1][id2]; 
    } 

    return -1; 
}

我有數以百萬計的距離的代碼。我檢查了LasTime，大約有6億個距離，其中4億個是獨特的。這意味着1/3的距離會重複，並且可以節省時間。

但是，當我使用這個哈希映射結構來存儲距離時，程序運行速度會變慢。這正是我發現的：如果我只是使用距離函數存儲距離，那麼整個程序運行速度大約慢50秒。（200秒存儲和150沒有）。但是，如果我存儲距離，然後在計算它們之前使用散列圖檢查距離是否存在，則程序變得更慢（程序的1/25需要300秒）。

我不理解這種行爲。我猜想一旦距離存儲完畢，檢索距離應該更快。請讓我知道這裏出了什麼問題，如果可以做得更快。

P.S：RAM不是問題。我正在服務器上運行大約160個演出的RAM。而使用hashmap時的峯值內存消耗僅佔內存總量的1.8％（見上圖）。所以分頁和顛簸應該不成問題。

來源

2012-09-02 The Flying Dutchman

Is getStoredDistance（long int id1，long int id2）slow？ –

您正在使用distanceHash.find（id1）N次？它的複雜性是什麼？ N * N？然後你把另一個N，它變成O（N * N * N） –

是getStoredDistance很慢。我看到情況如何。我有一個想法可以解決這個問題。我將在每個點中都有一個散列表。該散列表存儲距該特定節點的所有節點的距離。這將消除istanceHash.find（id1），因爲我知道我需要距離的節點。 –

But If I store the distances and then use the hashmap to check whether the distances exist before computing them, the program becomes way way slower(1/25th of the program takes 300 seconds).

我懷疑你正在尋找所有元素以批准數據。

好吧，HashMap的查找時間複雜度爲O（n），但你在getStoredDistance功能N次，這使得總的複雜度爲O（N * N）使用

distanceHash.find(id1)

兩次爲最壞的情況

400M * 400M = 160000000000000000太複雜

來源

2012-09-02 09:16:15

哈希映射快速插入但檢索速度慢

回答

相關問題