最快的方法來確定是否兩個字符串由一個字符

我寫一個C++算法，需要兩個字符串，如果你可以從字符串通過改變單個字符到另一個變異一個以串B返回true不同。這兩個字符串的大小必須相等，並且只能有一個不同。我還需要訪問已更改的索引以及已更改的strA字符。我發現了一個工作算法，但它遍歷每一對單詞，並且在任何大量輸入中運行速度太慢。最快的方法來確定是否兩個字符串由一個字符

bool canChange(std::string const& strA, std::string const& strB, char& letter) 
{ 
    int dif = 0; 
    int position = 0; 
    int currentSize = (int)strA.size(); 
    if(currentSize != (int)strB.size()) 
    { 
     return false; 
    } 
    for(int i = 0; i < currentSize; ++i) 
    { 
     if(strA[i] != strB[i]) 
     { 
      dif++; 
      position = i; 
      if(dif > 1) 
      { 
       return false; 
      } 
     } 
    } 
    if(dif == 1) 
    { 
     letter = strA[position]; 
     return true; 
    } 
    else return false; 
}

有關優化的任何建議？

來源

2016-01-26 dave vader

你想要什麼樣的優化？時間？記憶？順便說一下，你沒有處理已改變的字符的索引，你只是返回字符本身 – Pooya

我可以看到'canChange（）'（有趣的名字，在那）檢查字符對 - 你能詳細說明'每一個單詞對？ – greybeard

我認爲這是「通過每一對單詞迭代」，這是慢的來源;發佈的代碼看起來非常有效。如果將文本1中的每個單詞與文本2中的每個單詞進行比較，那麼這是一個O（N^2）操作，無論您對canChange（）進行了多少優化，它都將變得很慢。 –

您的輸入有多大？

我認爲，STRA [I]，STRB [I]有函數調用的開銷，除非它的內聯。因此，請確保您打開內聯並進行編譯並進行性能測試。否則，請嘗試使用strA.c_str（）將字節作爲char *。

如果所有的失敗，它仍然不夠快，試着將你的字符串分割成塊，並使用memcmp或STRNCMP的塊。如果沒有區別，移動到下一個塊，直到達到最後或找到差異。如果發現差異，請逐字節比較，直到找到差異爲止。我建議這個路由，因爲memcmp通常比你的微不足道的實現更快，因爲它們可以利用處理器SSE擴展等來做非常快速的比較。

此外，您的代碼存在問題。假設strA比strB長，並且只檢查數組訪問器的A的長度。

來源

2016-01-26 01:00:02 Matt

'...只檢查數組訪問器的A的長度 - 函數終止提前，除非兩個字符串具有相同的長度。 –

這是一個有點難以得到檢查所有的字符字符串，除非你能接受偶爾的不正確的結果了。

我建議使用標準庫的功能，而不是嘗試計算不匹配的數量。例如;

#include <string> 
#include <algorithm> 

bool canChange(std::string const& strA, std::string const& strB, char& letter, std::size_t &index) 
{ 
    bool single_mismatch = false; 
    if (strA.size() == strB.size()) 
    { 
     typedef std::string::const_iterator ci; 
     typedef std::pair<ci, ci> mismatch_result; 

     ci begA(strA.begin()), endA(strA.end()); 

     mismatch_result result = std::mismatch(begA, endA, strB.begin()); 

     if (result.first != endA) // found a mismatch 
     { 
      letter = *(result.first); 
      index = std::distance(begA, result.first); 

      // now look for a second mismatch 

      std::advance(result.first, 1); 
      std::advance(result.second, 1); 

      single_mismatch = (std::mismatch(result.first, endA, result.second).first == endA); 
     } 
    } 
    return single_mismatch; 
}

這適用於所有版本。它可以在C++ 11中簡化一點。

如果上述返回true，然後一個錯配被發現。

如果返回值爲false，那麼這些字符串的大小不一致，或者不匹配的數量不等於1（字符串相等或者有多個不匹配）。

letter和index不變如果字符串不同的長度或完全相等，但以其它方式識別所述第一失配（字符的值在strA，和index）。

來源

2016-01-26 01:51:50 Peter

如果要優化主要是完全相同的字符串，你可以使用的x86 SSE/AVX向量指令。您的基本想法看起來很好：只要您檢測到第二個區別，就會中斷。

要查找和計算字符差異，像PCMPEQB/PMOVMSKB/test-and-branch這樣的序列可能是好的。（使用C/C++內部函數來獲取這些向量指令）。當您的向量循環檢測到當前塊中的非零差異時，可以使用位掩碼來查看是否剛剛找到第一個差異，或者如果在同一個塊中發現了兩個差異。

我扔了一個未經測試的，沒有完全充實的AVX2版本的我所描述的。 此代碼假定字符串長度爲32的多個倍數。儘早停止並用清理結尾處理最後一個塊是留給讀者的一個練習。

#include <immintrin.h> 
#include <string> 

// not tested, and doesn't avoid reading past the end of the string. 
// TODO: epilogue to handle the last up-to-31 left-over bytes separately. 
bool canChange_avx2_bmi(std::string const& strA, std::string const& strB, char& letter) { 
    size_t size = strA.size(); 
    if (size != strB.size()) 
     return false; 

    int diffs = 0; 
    size_t diffpos = 0; 
    size_t pos = 0; 
    do { 
     uint32_t diffmask = 0; 
     while(pos < size) { 
      __m256i vecA = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(& strA[pos])); 
      __m256i vecB = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(& strB[pos])); 
      __m256i vdiff = _mm256_cmpeq_epi8(vecA, vecB); 
      diffmask = _mm256_movemask_epi8(vdiff); 
      pos += 32; 
      if (diffmask) break; // gcc makes worse code if you include && !diffmask in the while condition, instead of this break 
     } 
     if (diffmask) { 
      diffpos = pos + _tzcnt_u32(diffmask); // position of the lowest set bit. Could safely use BSF rather than TZCNT here, since we only run when diffmask is non-zero. 
      diffs += _mm_popcnt_u32(diffmask); 
     } 
    } while(pos < size && diffs <= 1); 

    if (diffs == 1) { 
     letter = strA[diffpos]; 
     return true; 
    } 
    return false; 
}

醜陋break，而不是包括在while條件顯然有助於gcc generate better code。 do{}while()也符合我希望asm出來的方式。我沒有嘗試使用for或while循環來查看gcc會做什麼。

內環真的是緊是這樣的：

.L14: 
     cmp  rcx, r8 
     jnb  .L10  # the while(pos<size) condition 
.L6: # entry point for first iteration, because gcc duplicates the pos<size test ahead of the loop 

     vmovdqu ymm0, YMMWORD PTR [r9+rcx]  # tmp118,* pos 
     vpcmpeqb  ymm0, ymm0, YMMWORD PTR [r10+rcx]  # tmp123, tmp118,* pos 
     add  rcx, 32 # pos, 
     vpmovmskb  eax, ymm0  # tmp121, tmp123 
     test eax, eax  # tmp121 
     je  .L14  #,

從理論上講，這應該每2個時鐘（英特爾Haswell的）一個迭代運行。循環中有7個融合域uop。（將會是6，但是2-reg addressing modes apparently can't micro-fuse on SnB-family CPUs。）由於兩個uops是負載，而不是ALU，所以在SnB/IvB上這個吞吐量也是可以實現的。

這對於在兩條弦相同的區域飛行非常有用。正確處理任意字符串長度的開銷會使得這可能比簡單的標量函數慢，如果字符串很短，和/或早期有多個差異。

來源

2016-01-27 07:15:26

而不是popcnt，您可以使用奇偶校驗來查看popcnt是奇數還是偶數。但是在x86上找到32位整數的奇偶校驗（使用AVX2）的最快方法是使用'popcnt（x）＆1'。不需要SSE4的SSE版本可能需要這樣做。（x86可以通過xor al，ah' /'setp al'（PF設置爲偶校驗，IIRC）進行16位奇偶校驗。 –

最快的方法來確定是否兩個字符串由一個字符

回答

相關問題