如何找到一個字符串的最長匹配，包括蟒蛇

新的Python /編程重點單詞，所以不太清楚如何詞組這個....如何找到一個字符串的最長匹配，包括蟒蛇

我想要做的是：輸入一個句子，找到輸入句子和一組存儲的句子/字符串的所有匹配項，並返回最長的匹配字符串組合。

我認爲答案會與正則表達式有關，但我還沒有開始這些，並不想如果我不需要。

我的問題：是正則表達式的方式去呢？或者有沒有辦法做到這一點，而不導入任何東西？

，如果它可以幫助你明白我的問題/想法，繼承人僞碼，我想要做的事：

input = 'i play soccer and eat pizza on the weekends' 
focus_word = 'and' 

ss = [ 
     'i play soccer and baseball', 
     'i eat pizza and apples', 
     'every day i walk to school and eat pizza for lunch', 
     'i play soccer but eat pizza on the weekend', 
    ] 

match = MatchingFunction(input, focus_word, ss) 
# input should match with all except ss[3] 

ss[0]match= 'i play soccer and' 
ss[1]match = 'and' 
ss[2]match = 'and eat pizza' 

#the returned value match should be 'i play soccer and eat pizza'

來源

2012-09-29 Arthur64

這聽起來像功課...你能告訴我們你試過了什麼嗎？ StackOverflow可以用來請求作業幫助 - 但是你必須先做出努力！ –

不是作業...哈哈這樣做是爲了好玩。我到目前爲止是一個輸入（）鍵入一個句子，一個數據庫來存儲一組所有輸入（設置所以theres沒有重複），然後一個函數來遍歷數據庫，並找到我是否==輸入。我不確定從關鍵字部分開始 – Arthur64

這聽起來像你想找到每個字符串的輸入字符串之間的longest common substring和你數據庫。假設你有一個函數LCS會發現兩個字符串的最長公共子，你可以這樣做：

> [LCS(input, s) for s in ss] 
['i play soccer and ', 
' eat pizza ', 
' and eat pizza ', 
' eat pizza on the weekend']

然後，它聽起來就像你正在尋找您的列表中的最重複的子的字符串。（糾正我，如果我錯了，但我不太清楚你在一般情況下尋找什麼！）從上面的數組輸出中，你將使用什麼字符串組合來創建輸出字符串？

根據您的意見，我想這應該做的伎倆：

> parts = [s for s in [LCS(input, s) for s in ss] if s.find(focus_word) > -1] 
> parts 
['i play soccer and ', ' and eat pizza ']

然後，爲了擺脫在這個例子中，重複的話：

> "".join([parts[0]] + [p.replace(focus_word, "").strip() for p in parts[1:]]) 
'i play soccer and eat pizza'

來源

2012-09-29 17:24:44

謝謝！最長的公共子字符串看起來接近我後（如果不是完全！）最後的結果，我試圖得到關鍵字是'和'是最長的匹配，只有'和'在他們。所以從你給出的數組輸出中，我只想要['我踢足球和'，'吃披薩']。 – Arthur64

謝謝！而已！ – Arthur64

我測試了幾個不同的字符串集合，唯一的問題是最後的加入語句。如果輸入='看到狗跑掉'和key_word ='the'和ss = ['看狗跑'，'看到狗'，'看到狗跑得快'，'看到狗跑掉'，'狗我看到這個： >>> parts ['看到狗'，'看到狗跑'，'狗'，'狗'] > >>「」.join（[parts [0]] + [p.replace（b，「」）.strip（）for p in parts [1：]]） 'see dogsee dog rundogdog' - 這很好 - 這不像拿到零件那麼重要 – Arthur64

如何找到一個字符串的最長匹配，包括蟒蛇

回答

相關問題