查找2個字符串之間的最大子序列

給定2個字符串，查找2個字符串之間的最大字符重疊順序。查找2個字符串之間的最大子序列

例如：

「klccconcertcenter」
「klconventioncenter」

在序列的最大重疊是「klconetcenter」

這被例示：

KLC立方厘米上ÇËřtcenter
klcon v ËÑ噸離子中心

順便說一句，這不是家庭作業。我實際上在生產中使用它來測試2個字符串的相似程度。例如，比較「ABC私人有限公司」和「ABC私人有限公司」，我將採用最大順序重疊，除以長度較短的字符串，對這兩個字符串的相關程度進行0-1等級評定。

我實際上有一個解決方案，但它是一個近似值。我幼稚的遞歸方法不適用於長字符串。

好，因爲人們都在問我的解決方案：

def match_count(phrase1, phrase2): 
    """This approximates match_count_recur() because the recursive function does not scale for long strings""" 
    MAX_MATCH_COUNT_WIDTH = 15 
    if len(phrase1) > MAX_MATCH_COUNT_WIDTH: 
     return match_count(phrase1[:len(phrase1)/2], phrase2) + match_count(phrase1[len(phrase1)/2:], phrase2) 
    return match_count_recur(phrase1, phrase2) 


def match_count_recur(phrase1, phrase2): 
    """ 
    Checks the number of characters that intersect (in order) between 2 phrases 
    """ 
    if len(phrase2) < 1: return 0 
    if len(phrase1) < 1: return 0 
    if len(phrase1) == 1: return 1 if phrase1 in phrase2 else 0 
    if len(phrase2) == 1: return 1 if phrase2 in phrase1 else 0 
    if phrase1 in phrase2: return len(phrase1) 
    if phrase2 in phrase1: return len(phrase2) 

    char = phrase1[0] 
    current_count = 1 if char in phrase2 else 0 
    phrase2_idx = phrase2.index(char) + 1 if char in phrase2 else 0 

    no_skip_count = current_count + match_count(phrase1[1:], phrase2[phrase2_idx:]) 
    skip_count = match_count(phrase1[1:], phrase2) 

    return max(no_skip_count, skip_count) 

def get_similarity_score(phrase1, phrase2): 
    """ 
    Gets the similarity score of 2 phrases 
    """ 
    phrase1 = phrase1.lower().replace(" ", "") 
    phrase2 = phrase2.lower().replace(" ", "") 
    shorter_phrase = phrase2 
    longer_phrase = phrase1 
    if len(phrase1) < len(phrase2): 
     shorter_phrase = phrase1 
     longer_phrase = phrase2 
    return float(match_count(shorter_phrase, longer_phrase))/float(len(shorter_phrase))

來源

2017-02-16 nubela

您可能希望https://en.wikipedia.org/wiki/Levenshtein_distance爲其確定存在現有的Python實現。 – ceejayoz

無論是不是家庭作業都沒關係 - 你還沒有顯示任何工作。你應該提供你工作的證據，讓用戶試着弄清楚爲什麼你的方法「不能按比例縮放長字符串」 – asongtoruin

@asongtoruin信仰在哪裏？編輯並添加我的解決方案 – nubela

聲音對我說，你正在試圖解決的最長子問題（wiki page）。 This有一個可能很有趣的Python實現。

Python中還有一個內置的實現類似這樣的東西，在difflib模塊中。特別是difflib.Differ類。

來源

2017-02-16 16:33:29 Bendik

看起來像我現有的解決方案實際上解決了這個問題。這只是緩慢（如果字符串超長），因爲問題的複雜性。 – nubela

這個問題有可能在二次時間內解決（所需時間隨着每個字符串長度的乘積而增長），所以它實際上並不是很複雜。聰明的做法（見鏈接）可能會帶來巨大的收益。 – Bendik

查找2個字符串之間的最大子序列

回答

相關問題