最長的常見子序列：爲什麼這是錯誤的？

int lcs(char * A, char * B) 
{ 
    int m = strlen(A); 
    int n = strlen(B); 
    int *X = malloc(m * sizeof(int)); 
    int *Y = malloc(n * sizeof(int)); 
    int i; 
    int j; 
    for (i = m; i >= 0; i--) 
    { 
     for (j = n; j >= 0; j--) 
     { 
      if (A[i] == '\0' || B[j] == '\0') 
       X[j] = 0; 
      else if (A[i] == B[j]) 
       X[j] = 1 + Y[j+1]; 
      else 
       X[j] = max(Y[j], X[j+1]); 
     } 
     Y = X; 
    } 
    return X[0]; 
}

這是有效的，但valgrind大聲抱怨無效讀取。我怎麼搞亂了記憶？對不起，我總是在C內存分配上失敗。最長的常見子序列：爲什麼這是錯誤的？

來源

2013-05-19 user54609

真的不喜歡你如何做'Y = X;' - 這是一個內存泄漏，並有可能發生緩衝區溢出。 – paddy

從valgrind中顯示第一對錯誤是明智的;我們不應該猜測它正在報告什麼。 –

請使用真實的變量名稱。 ''a'len'代表'm'，'b_len'代表n，'longestSoFar'代表'X'（如果這是準確的）等等。 – djechlin

這裏的問題與你的表的大小有關。需要注意的是你作爲

int *X = malloc(m * sizeof(int)); 
int *Y = malloc(n * sizeof(int));

分配空間但是，你正在使用的索引0 ... m和0，...，N，這意味着有m個必要的X + 1個插槽和N + 1個插槽必要Y.

嘗試改變這一閱讀

int *X = malloc((m + 1) * sizeof(int)); 
int *Y = malloc((n + 1) * sizeof(int));

希望這有助於！

來源

2013-05-19 22:10:17 templatetypedef

耶你拯救了我的一天！ – user54609

糟糕。如果琴絃長度不同，則不工作。：/ – user54609

'oops'是你的錯誤算法，而不是你的內存管理。 –

一系列問題。首先，正如templatetypedef所說的，你分配不足。

然後，就像稻田說的那樣，你不能釋放你的malloc'd內存。如果您需要Y=X一行，則需要將原始malloc'd空格地址存儲在另一組變量中，以便您可以在其上調用free。

...mallocs... 
int * original_y = Y; 
int * original_x = X; 
...body of code... 
free(original_y); 
free(original_x); 
return X[0];

但是這並沒有解決你的新問題，這就是爲什麼代碼不工作？我承認我不能遵循你的代碼（沒有更多的研究），但是我可以提出一種算法，它可以工作，而且更容易理解。這可能有點僞代碼並不是特別高效，但要正確的是第一步。我後來列出了一些優化。

int lcs(char * A, char * B) 
{ 
    int length_a = strlen(A); 
    int length_b = strlen(B); 


    // these hold the position in A of the longest common substring 
    int longest_found_length = 0; 

    // go through each substring of one of the strings (doesn't matter which, you could pick the shorter one if you want) 
    char * candidate_substring = malloc(sizeof(char) * length_a + 1); 
    for (int start_position = 0; start_position < length_a; start_position++) { 
    for (int end_position = start_position; end_position < length_a; end_position++) { 

     int substring_length = end_position - start_position + 1; 

     // make a null-terminated copy of the substring to look for in the other string 
     strncpy(candidate_substring, &(A[start_position]), substring_length); 
     if (strstr(B, candidate_substring) != NULL) { 
     longest_found_length = substring_length; 
     } 
    } 

    } 
    free(candidate_substring); 
    return longest_found_length; 
}

一些不同的優化，你可以這樣做：

 // if this can't be longer, then don't bother checking it. You can play games with the for loop to not have this happen, but it's more complicated. 
     if (substring_length <= longest_found_index) { 
     continue; 
     }

和

 // there are more optimizations you could do to this, but don't check 
     // the substring if it's longer than b, since b can't contain it. 
     if (substring_length > length_b) { 
     continue; 
     }

和

if (strstr(B, candidate_substring) != NULL) { 
    longest_found_length = end_position - start_position + 1; 
    } else { 
    // if nothing contains the shorter string, then nothing can contain the longer one, so skip checking longer strings with the same starting character 
    break; // skip out of inner loop to next iteration of start_position 
    }

而是每個候選子複製到一個新的字符串的，你可以與t進行角色互換他end_position + 1和NUL字符。然後，在b中查找該子字符串後，將原始字符替換爲end_position+1。這會更快，但稍微複雜一些。

來源

2013-05-20 00:23:07 xaxxon

除了什麼templatetypedef說，有些事情要考慮：

爲什麼不X和Y大小相同？
你爲什麼在做Y = X？這是指針的分配。你的意思是memcpy(Y, X, (n+1)*sizeof(int))？

來源

2013-05-20 00:26:19 tmyklebu

注意：我通常不會寫出兩個答案，如果您覺得它很俗氣，請隨時評論這個問題並記錄下來。這個答案是一個更優化的解決方案，但我想先給出我能想到的最直接的解決方案，然後把它放在另一個答案中，以免混淆兩者。基本上他們是爲不同的觀衆。

有效解決此問題的關鍵是在尋找更長的子碼時不要丟棄有關較短常見子碼的信息。天真地，你檢查每個字符串與另一個字符串，但如果你知道「AB」匹配「ABC」，並且你的下一個字符是C，不要檢查「ABC」是否在「ABC」中，只要檢查「AB」後面的點是「C」。

對於A中的每個字符，您必須檢查B中的所有字母，但是因爲一旦更長的子字符串不再可用，我們會停止瀏覽B，這會極大地限制檢查次數。每當您獲得更長時間的匹配時，您都會消除後端的檢查，因爲它不再是更長的子字符串。例如，如果A和B都長，但不包含普通字母，則A中的每個字母將與B中的每個字母進行比較，以得到A * B的運行時間。

對於有很多匹配的序列，但匹配長度不是較短字符串長度的很大一部分，則可以使用A * B組合來檢查兩個字符串中較短的字符串（A或B）導致A * B * A或A * B * B，這對於相似長度的字符串基本上是O（n^3）時間。我真的認爲在這個解決方案中的優化會比n^3更好，即使有三重嵌套for循環，但它似乎並不像我所說的那樣最好。

雖然我正在考慮這一點。要麼找到的子字符串不是字符串長度的重要部分，在這種情況下，優化沒有太大的作用，但是A * B的每個組合的比較不會與A或B一起縮放，是常數 - 或者 - 它們是A和B的重要部分，它直接與必須進行比較的A * B組合相分離。

我只是可以問一個問題。

int lcs(char * A, char * B) 
{ 
    int length_a = strlen(A); 
    int length_b = strlen(B); 

    // these hold the position in A of the longest common substring 
    int longest_length_found = 0; 

    // for each character in one string (doesn't matter which), look for incrementally larger strings in the other 
    for (int a_index = 0; a_index < length_a - longest_length_found; a_index++) { 
    for (int b_index = 0; b_index < length_b - longest_length_found; b_index++) { 

     // offset into each string until end of string or non-matching character is found 
     for (int offset = 0; A[a_index+offset] != '\0' && B[b_index+offset] != '\0' && A[a_index+offset] == B[b_index+offset]; offset++) {   
     longest_length_found = longest_length_found > offset ? longest_length_found : offset; 
     } 
    } 
    } 
    return longest_found_length; 
}

來源

2013-05-20 00:54:43 xaxxon

最長的常見子序列：爲什麼這是錯誤的？

回答

相關問題