從多個字符串

我刪除重複的文字：從多個字符串

a = "This is Product A with property B and propery C. Buy it now!" 
b = "This is Product B with property X and propery Y. Buy it now!" 
c = "This is Product C having no properties. Buy it now!"

我正在尋找一種算法，可以這樣做：

> magic(a, b, c) 
=> ['A with property B and propery C', 
    'B with property X and propery Y', 
    'C having no properties']

我必須找到在1000+文本重複。超級表演不是必須的，但會很好。

- 更新

我正在尋找單詞序列。所以，如果：

d = 'This is Product D with text engraving: "Buy". Buy it now!'

第一個「賣」不應該重複。我猜測我必須使用n之後的字眼，以便看作是重複的。

來源

2013-08-24 Willian

問題不明確？如何定義重複的文本？ –

爲什麼「有財產」在重複時不重複？：D – fl00r

1）如果有第四個字符串「Bumblebee zebra」。 '魔術（a，b，c，d）'會被期望返回所有四個未修改的字符串？ 2）預期如何使用位置信息，例如「魔術師」示例刪除了「立即購買！」儘管事實上這是字符串的不同部分。可能你正在尋找一個'diff'函數？ –

def common_prefix_length(*args) 
    first = args.shift 
    (0..first.size).find_index { |i| args.any? { |a| a[i] != first[i] } } 
end 

def magic(*args) 
    i = common_prefix_length(*args) 
    args = args.map { |a| a[i..-1].reverse } 
    i = common_prefix_length(*args) 
    args.map { |a| a[i..-1].reverse } 
end

a = "This is Product A with property B and propery C. Buy it now!" 
b = "This is Product B with property X and propery Y. Buy it now!" 
c = "This is Product C having no properties. Buy it now!" 

magic(a,b,c) 
# => ["A with property B and propery C", 
#  "B with property X and propery Y", 
#  "C having no properties"]

來源

2013-08-24 11:05:24 falsetru

我喜歡你的解決方案看序列而不是單個單詞！ – Willian

你的數據

sentences = [ 
    "This is Product A with property B and propery C. Buy it now!", 
    "This is Product B with property X and propery Y. Buy it now!", 
    "This is Product C having no properties. Buy it now!" 
]

你的魔法

def magic(data) 
    prefix, postfix = 0, -1 
    data.map{ |d| d[prefix] }.uniq.compact.size == 1 && prefix += 1 or break while true 
    data.map{ |d| d[postfix] }.uniq.compact.size == 1 && prefix > -postfix && postfix -= 1 or break while true 
    data.map{ |d| d[prefix..postfix] } 
end

你的輸出

magic(sentences) 
#=> [ 
#=> "A with property B and propery C", 
#=> "B with property X and propery Y", 
#=> "C having no properties" 
#=> ]

或者你可以使用loop代替while true

def magic(data) 
    prefix, postfix = 0, -1 
    loop{ data.map{ |d| d[prefix] }.uniq.compact.size == 1 && prefix += 1 or break } 
    loop{ data.map{ |d| d[postfix] }.uniq.compact.size == 1 && prefix > -postfix && postfix -= 1 or break } 
    data.map{ |d| d[prefix..postfix] } 
end

來源

2013-08-24 12:07:57 fl00r

當'data'碰巧是一串相同的字符串時，你的'magic'不會終止。你必須檢查'prefix'和'postfix'索引，這個位置的'd'中的字符存在。 – sawa

好抓，@sawa！固定 – fl00r

-1

編輯：此代碼有錯誤。只是留下我的回答供參考，因爲如果人們在被降低評分後刪除答案，我不喜歡它。每個人都會犯錯誤:-)

我喜歡@filttru的方法，但覺得代碼不必要的複雜。這裏是我的嘗試：

def common_prefix_length(strings) 
    i = 0 
    i += 1 while strings.map{|s| s[i] }.uniq.size == 1 
    i 
end 

def common_suffix_length(strings) 
    common_prefix_length(strings.map(&:reverse)) 
end 

def uncommon_infixes(strings) 
    pl = common_prefix_length(strings) 
    sl = common_suffix_length(strings) 
    strings.map{|s| s[pl...-sl] } 
end

由於OP可關注業績，我做了一個快速基準：

require 'fruity' 
require 'securerandom' 

prefix = 'PREFIX ' 
suffix = ' SUFFIX' 
test_data = Array.new(1000) do 
    prefix + SecureRandom.hex + suffix 
end 

def fl00r_meth(data) 
    prefix, postfix = 0, -1 
    data.map{ |d| d[prefix] }.uniq.size == 1 && prefix += 1 or break while true 
    data.map{ |d| d[postfix] }.uniq.size == 1 && postfix -= 1 or break while true 
    data.map{ |d| d[prefix..postfix] } 
end 

def falsetru_common_prefix_length(*args) 
    first = args.shift 
    (0..first.size).find_index { |i| args.any? { |a| a[i] != first[i] } } 
end 

def falsetru_meth(*args) 
    i = falsetru_common_prefix_length(*args) 
    args = args.map { |a| a[i..-1].reverse } 
    i = falsetru_common_prefix_length(*args) 
    args.map { |a| a[i..-1].reverse } 
end 

def padde_common_prefix_length(strings) 
    i = 0 
    i += 1 while strings.map{|s| s[i] }.uniq.size == 1 
    i 
end 

def padde_common_suffix_length(strings) 
    padde_common_prefix_length(strings.map(&:reverse)) 
end 

def padde_meth(strings) 
    pl = padde_common_prefix_length(strings) 
    sl = padde_common_suffix_length(strings) 
    strings.map{|s| s[pl...-sl] } 
end 

compare do 
    fl00r do 
    fl00r_meth(test_data.dup) 
    end 

    falsetru do 
    falsetru_meth(*test_data.dup) 
    end 

    padde do 
    padde_meth(test_data.dup) 
    end 
end

這些結果如下：

Running each test once. Test will take about 1 second. 
fl00r is similar to padde 
padde is faster than falsetru by 30.000000000000004% ± 10.0%

來源

2013-08-24 14:56:14

願意解僱他的反對者嗎？ –

當數據碰巧是一個相同字符串的數組時，您的代碼將不會終止。你必須檢查'i'索引，該位置字符串中的字符存在。 – sawa

您的代碼與我的第一版答案類似。我改爲當前版本，因爲我認爲創建/刪除中間數組（'map {..} .uniq.size'）可能會導致性能下降。根據你的基準，我錯了。 ;） – falsetru

從多個字符串

回答

相關問題