2014-06-16 27 views
-9

我要檢查兩個字符串是否彼此相似.... 例如:蟒蛇正則表達式 - 比賽兩串

string1 = "Select a valid choice. **aaaa** is not one of the available choices." 
string2 = "Select a valid choice. **bbbb** is not one of the available choices." 

string3 = "Ensure this value has at most 30 characters (it has 40 chars)." 
string4 = "Ensure this value has at most 60 characters (it has 110 chars)." 

如果我比較字符串1至字符串2它應該返回True,如果我比較string1和string3,它應該返回False

+0

複製向我們展示你已經嘗試過的正則表達式。 –

+0

它不會將string1和2設置爲true。 –

+1

不能比較值coz string1包含aaaa並且string2包含bbbb它將返回False而不是True。 –

回答

3

您可以使用Levenshtein distance

def lev(s1, s2): 
    if len(s1) < len(s2): 
     return lev(s2, s1) 

    # len(s1) >= len(s2) 
    if len(s2) == 0: 
     return len(s1) 

    previous_row = xrange(len(s2) + 1) 
    for i, c1 in enumerate(s1): 
     current_row = [i + 1] 
     for j, c2 in enumerate(s2): 
      insertions = previous_row[j + 1] + 1 # j+1 instead of j since previous_row and current_row are one character longer 
      deletions = current_row[j] + 1  # than s2 
      substitutions = previous_row[j] + (c1 != c2) 
      current_row.append(min(insertions, deletions, substitutions)) 
     previous_row = current_row 

    return previous_row[-1] 

string1 = "Select a valid choice. aaaa is not one of the available choices." 
string2 = "Select a valid choice. bbbb is not one of the available choices." 
string3 = "Ensure this value has at most 30 characters (it has 40 chars)." 
string4 = "Ensure this value has at most 60 characters (it has 110 chars)." 

print lev(string1, string2) # => 4 
print lev(string3, string4) # => 3 
print lev(string1, string3) # => 49 

代碼here