檢測到2個字符串相同但順序不同

我的目標是檢測2個字符串是否相同，但順序不同。檢測到2個字符串相同但順序不同

Example 
"hello world my name is foobar" is the same as "my name is foobar world hello"

我已經試過的是將兩個字符串拆分成列表並在循環內進行比較。

text = "hello world my name is foobar" 
textSplit = text.split() 

pattern = "foobar is my name world hello" 
pattern = pattern.split() 

count = 0 
for substring in pattern: 
    if substring in textSplit: 
     count += 1 

if (count == len(pattern)): 
    print ("same string detected")

它返回我的意圖，但這是真正正確和有效的方式？也許還有另一種方法。任何關於該主題的期刊建議都會非常好。

編輯1：重複的話是重要

text = "fish the fish the fish fish fish" 
pattern = "the fish"

它必須返回false

來源

2017-10-12 nfl-x

怎麼樣在哪裏重複單詞？是「魚」還是「魚魚魚魚」一樣呢？ –

'sorted（text）== sorted（pattern）'maybe？它效率不高，但實施起來相當容易。 – ozgur

如果dups不重要，'len（set（text）.difference（pattern））== 0' – Vinny

如果你想檢查2句具有相同的話（與相同數量的出現次數的），你可以在單詞的句子拆分，並比較str12的lenght對它們進行排序：

>>> sorted("hello world my name is foobar".split()) 
['foobar', 'hello', 'is', 'my', 'name', 'world'] 
>>> sorted("my name is foobar world hello".split()) 
['foobar', 'hello', 'is', 'my', 'name', 'world']

你可以在一個函數定義檢查：

def have_same_words(sentence1, sentence2): 
    return sorted(sentence1.split()) == sorted(sentence2.split()) 

print(have_same_words("hello world my name is foobar", "my name is foobar world hello")) 
# True 

print(have_same_words("hello world my name is foobar", "my name is foobar world hello")) 
# True 

print(have_same_words("hello", "hello hello")) 
# False 

print(have_same_words("hello", "holle")) 
# False

如果情況並不重要，你可以比較小寫的句子：

def have_same_words(sentence1, sentence2): 
    return sorted(sentence1.lower().split()) == sorted(sentence2.lower().split()) 

print(have_same_words("Hello world", "World hello")) 
# True

注意：您也可以使用collections.Counter而不是sorted。複雜性將是O(n)而不是O(n.log(n))，無論如何這並不是很大的差別。 import collections可能比排序字符串需要更長的時間：

from collections import Counter 

def have_same_words(sentence1, sentence2): 
    return Counter(sentence1.lower().split()) == Counter(sentence2.lower().split()) 

print(have_same_words("Hello world", "World hello")) 
# True 

print(have_same_words("hello world my name is foobar", "my name is foobar world hello")) 
# True 

print(have_same_words("hello", "hello hello")) 
# False 

print(have_same_words("hello", "holle")) 
# False

來源

2017-10-12 09:04:50

謝謝。它按預期工作。您能否總結或鏈接我是什麼複雜性，最壞的情況以及爲什麼/如何將複雜度定義爲O（n）。這將是非常有幫助的。 –

排序是'O（n.log（n）'，計數是'O（n）'。除了：考慮到句子的大小，我們不應該在乎複雜性。 –

看起來像我需要開始弄清楚那些符號是什麼哈哈。 –

，你可以從每個字符串列表，並計算出它們之間的串路口;如果它的長度與第一個長度相同，那麼它們是相同的。

text = "hello world my name is foobar" 
pattern = "foobar is my name world hello" 
text = text.split(" ") 
pattern = pattern.split(" ") 
result = True 
if len(text) != len(pattern): 
    result = false 
else: 
    l = list(set(text) & set(pattern)) 
    if len(l)!=len(text): 
     result = False 
if result == True: 
    print ("same string detected") 
else: 
    print ("Not the same string")

來源

2017-10-12 07:38:16

你需要警惕你的長度檢查那裏......如果len（l）！= len（文本）' - 因爲'l'已刪除重複項，那麼'text'有重複的詞 - 這個檢查不會發生可靠...... –

'set（text）'和'set（pattern）'刪除重複項 –

我想你的實現然後文本中的額外單詞被忽略（也許這是有意？）。

也就是說，如果text = "a b"和pattern = "a"然後你打印"same string detected"

我會做到這一點：比較，其中順序無關緊要讓我想起sets。因此，與集的解決辦法是：

same = set(text.split()) == set(pattern.split())

編輯：考慮到重複的文字編輯的問題：

from collections import Counter 
split_text = text.split() 
split_pattern = pattern.split() 
same = (Counter(split_text) == Counter(split_pattern))

來源

2017-10-12 07:38:59

你的解決方案認爲''hello''和''hello hello''是相等的。目前尚不清楚這是否是理想的行爲。 –

@Eric在這種情況下，一個'set'換掉'collections.Counter' ... –

問題更新 –

您還可以從你想要的字符串做一個新的字符串str12比較。然後用2 *（str12不重複）

str1 = "hello world my name is foobar" 
str2 = "my name is foobar world hello" 


str12 = (str1 + " " +str2).split(" ") 

str12_remove_duplicate = list(set(str12)) 

if len(str12) == 2 * len(str12_remove_duplicate): 
    print("String '%s' and '%s' are SAME but different order" % (str1, str2)) 
else: 
    print("String '%s' and '%s' are NOT SAME" % (str1, str2))

來源

2017-10-12 08:18:48 sslloo

檢測到2個字符串相同但順序不同

回答

相關問題