字符串搜索 - 並行版本比較慢

我正在試圖平行化字符串搜索的樸素算法。我創建了簡單的版本，然後嘗試使用多個線程來加速它。字符串搜索 - 並行版本比較慢

但下面的代碼使得它更慢：

template<typename T> long unsigned int simple_paralel(const T * str1, unsigned long int str1_length, const T * str2, unsigned long int str2_length) { 
    unsigned int long result = ~0; 
    unsigned int long count = 0; 

    unsigned long int in; 
    unsigned int long top; 
    #pragma omp parallel 
    #pragma for ordered shared(result, count) private(in, top) firstprivate(str1, str2, str1_length, str2_length) 
    for (top = 0; top < str1_length; top++) { 
    in = 0; 

    // & top + in < str1_length 
    while (in < str2_length) { 
     if (top + in >= str1_length) 
     break; 

     if (str1[top+in] != str2[in]) { 
     break; 
     } 
     ++in; 

     if(in == str2_length) { 
     // shared and we want to have the smallest index 
     if(result >= top + 1) { 
      result = top + 1; 
     } 
     count++; 
     } 
    } 
    } 
    return count; 
}

我在做什麼錯？

來源

2016-12-31 tomascapek

隨機猜測：每個線程中沒有足夠的工作來彌補開始/停止/整理結果的開銷？ –

可能 - 許多週期將立即結束，因爲找不到匹配。如何解決這個問題？ – tomascapek

我很難說，因爲我不知道你想要解決的具體問題的具體細節（我瞭解基本知識，但是有很多細節，比如有多大的弦，有多少是，字符串的第一部分在保存之前有多少匹配，它們是否被排序，它們是否可以被排序等等 - 我確信我沒有想到所有重要的東西）。一般來說，每個線程與其他線程分開運行時間越長，並行運行性能越好。 –

所以我後退幾步，嘗試了不同的方法：

template<typename T> long unsigned int simple_paralel(const T * str1, unsigned long int str1_length, const T * str2, unsigned long int str2_length, int threads) { 
    count = 0; 

    unsigned long int block = str1_length/threads; 
    unsigned long int total = 0; 
    unsigned long int result = 0; 

    unsigned long int smallest = ~0; 
    #pragma omp parallel for shared(count) 
    for(int i=0; i<threads; i++) 
    { 
    unsigned long int res; 
    unsigned long int start = i * block; 
    if(i != threads -1) 
    { 
     total += block; 
     res = simple_p(str1 + start, block, str2, str2_length, false); 
    } else { 
     res = simple_p(str1 + start, str1_length - total, str2, str2_length); 
    } 

    if(res < result && res != 0){ 
     result = res; 
    } 
    } 

    return count; 
}

而且它按預期工作。但我仍然不敢相信，我不能用OpenMP做到這一點。

來源

2016-12-31 15:06:35 tomascapek

***但我仍然不敢相信，我不能用OpenMP來做這件事***我對這個說法感到困惑。看起來你在這裏使用openmp。這是否意味着解決問題的答案？ – drescherjm

我的不好！我的意思是：我想，OpenMP可以很容易地將輸入分成塊，就像我的解決方案。 – tomascapek

字符串搜索 - 並行版本比較慢

回答

相關問題