在數組中查找重複項 - 如何提高速度？

我寫了一個函數，如果數組包含重複項則返回true，否則返回false。我的運行時間僅在Leet Code提交的第50個百分點。爲什麼？這不是O（n），你怎麼能讓這個更快？在數組中查找重複項 - 如何提高速度？

def contains_duplicate(nums) 
    hsh = Hash.new(0) 

    nums.each do |num| 
    hsh[num] +=1 
    if hsh[num] > 1 
     return true 
    end 
    end 
    return false 
end

Runtime hash submission only in the 50th percentile

*編輯對於好奇，這裏的鏈接上LEET碼的編碼問題：https：//開頭leetcode.com/problems/contains-duplicate/

喲偷看，我跑了建議設置代碼，並得到一個更糟糕的運行時間：

def contains_duplicate(nums) 
    s = Set.new 
    nums.each { |num| return true unless s.add?(num) } 
    false 
end

Runtime set submission in 20th percentile

**最快運行時間

def contains_duplicate(nums) 
    hsh = Hash.new(0) 
    count=0 
    nums.each do |num| 
    count+=1 
    hsh[num]=1 
    if hsh.size < count 
     return true 
    end 
    end 
    return false 
end

HTTP：//i.stack.imgur.com/Xx21p.png

來源

2016-05-08 An Le

你異形它？ –

我覺得'!! nums.dup.uniq！'在這裏把它們都壓碎了，因爲['uniq！']（http://ruby-doc.org/core-2.3.0/Array.html#method-i-uniq -21）瘋狂得快。 – tadman

@tadman哇，當數組已經是唯一的時，'uniq！'返回nil真的讓我感到驚訝。儘管這更簡潔，但我可能更喜歡'def has_dups（array）; array.size> array.uniq.size; end'。 –

我對ruby不熟悉，但我可以看到你的循環需要每個項目3個哈希查找，並且哈希查找是分配新項目後涉及的最昂貴的操作。

嘗試這樣的事情，這需要每個項目只有一個查詢：

def contains_duplicate(nums) 
    hsh = Hash.new(0) 
    count=0 
    nums.each do |num| 
    count+=1 
    hsh[num]=1 
    if hsh.size < count 
     return true 
    end 
    end 
    return false 
end

來源

2016-05-09 02:13:31

好的。這裏我們走[第77百分位]（http://i.stack.imgur.com/Xx21p.png） –

好的，馬特。 –

你可以使用一組。

require 'set' 

s = Set.new 
nums.each { |num| return true unless s.add?(num) } 
false

請參閱Set#add?。

但是，我不希望任何與OP的方法有顯着的不同，因爲集合是用哈希值來實現的。

...但讓我們來看看。

require 'fruity'

要比較的方法如下：

def hash_way(nums) 
    hsh = Hash.new(0) 
    nums.each do |num| 
    return true if hsh.key?(num) 
    hsh[num] = 1 
    end 
    false 
end

以上是的OP碼具有小的變化。

def set_way(nums) 
    s = Set.new 
    nums.each { |num| return true unless s.add?(num) } 
    false 
end

@ gonzolo2000的方法（自刪除），@傑克的方法，稍作修改：

def uniq_way(nums) 
    nums.uniq.size < nums.size 
end 

def hash2_way(nums) 
    hsh = Hash.new(0) 
    count=0 
    nums.each do |num| 
    count+=1 
    hsh[num]=1 
    if hsh.size < count 
     return true 
    end 
    end 
    return false 
end 

def bench(nums, n) 
    nums = arr(n) 
    compare do 
    _hash { hash_way(nums) } 
    _set { set_way(nums) } 
    _uniq { uniq_way(nums) } 
    _hash2 { hash2_way(nums) } 
    end 
end

首先考慮的陣列與一個重複的元素：

def arr(n) 
    ((1..n).to_a << 1).shuffle 
end

例如，

arr(20) 
    #=> [17, 12, 1, 20, 3, 10, 15, 9, 5, 2, 14, 1, 18, 16, 7, 13, 19, 4, 8, 11, 6] 

bench(nums, 100) 
Running each test 128 times. Test will take about 1 second. 
_hash2 is similar to _hash 
_hash is similar to _uniq 
_uniq is similar to _set 

bench(nums, 1_000) 
Running each test 32 times. Test will take about 1 second. 
_hash2 is similar to _hash 
_hash is similar to _set 
_set is faster than _uniq by 2x ± 1.0 

bench(nums, 10_000) 
Running each test 2 times. Test will take about 1 second. 
_hash2 is similar to _hash 
_hash is similar to _set 
_set is similar to _uniq 

bench(nums, 100_000) 
Running each test once. Test will take about 2 seconds. 
_hash2 is similar to _hash 
_hash is similar to _set 
_set is faster than _uniq by 7x ± 1.0 

bench(nums, 1_000_000) 
Running each test once. Test will take about 51 seconds. 
_hash2 is similar to _hash 
_hash is faster than _uniq by 10.000000000000009% ± 10.0% 
_uniq is similar to _set

現在，我將改變測試數據，使得陣列的獨特元素的10％具有一個DUP：

def arr(n) 
    (1..n).to_a.concat((1..n/10).to_a).shuffle 
end

例如，

arr(30) 
    #=> [14, 3, 1, 5, 20, 11, 4, 2, 25, 15, 23, 18, 30, 2, 19, 10, 13, 
    # 26, 24, 8, 6, 21, 16, 27, 7, 17, 12, 1, 29, 3, 28, 9, 22] 

bench(nums, 100) 
Running each test 512 times. Test will take about 1 second. 
_hash2 is similar to _hash 
_hash is similar to _set 
_set is faster than _uniq by 3x ± 1.0 

bench(nums, 1_000) 
Running each test 128 times. Test will take about 2 seconds. 
_hash2 is similar to _hash 
_hash is similar to _set 
_set is faster than _uniq by 9x ± 1.0 

bench(nums, 10_000) 
Running each test 128 times. Test will take about 8 seconds. 
_hash2 is similar to _hash 
_hash is similar to _set 
_set is faster than _uniq by 79x ± 10.0 

bench(nums, 100_000) 
Running each test 16 times. Test will take about 17 seconds. 
_hash2 is similar to _hash 
_hash is similar to _set 
_set is faster than _uniq by 180x ± 10.0 

bench(nums, 1_000_000) 
Running each test 4 times. Test will take about 56 seconds. 
_hash2 is similar to _hash 
_hash is similar to _set 
_set is faster than _uniq by 810x ± 100.0

來源

2016-05-08 20:04:00

所以是的，哈希是最快的。對於我在編輯中鏈接的編碼問題，Set的運行時間較慢。 –

以上評論涉及我最初的發現，這是錯誤的。我已經修改了我的答案以進行更正並擴大了比較。 –

我總是象寫明顯代碼的想法。因此，我個人最喜歡的會是這個樣子：

array.uniq == array

而且一些基準與你原來的方法：

a = (1..20).to_a 

Benchmark.realtime { 100000.times{ contains_duplicate(a) }} 
=> 0.937844 

Benchmark.realtime { 100000.times{ a.uniq == a }} 
=> 0.804872

的還與包含重複的數組：

a = a * 3 

Benchmark.realtime { 100000.times{ contains_duplicate(a) }} 
=> 1.068844 

Benchmark.realtime { 100000.times{ a.uniq == a }} 
=> 0.919273

來源

2016-05-08 20:53:37 jack

如果'a'沒有排序，這將不起作用。 – sawa

比較數組的相等性需要比較數組中的每個元素。因爲我們知道獨特的數組是從原始數組派生的，所以我認爲不需要比較對象，我們可以簡單地使用：'def has_dups（array）; array.size> array.uniq.size; end'。 –

是啊，如果你使用array.uniq那已經O（n），然後比較數組與另一個是另一個O（n^2）是不是。 –

在數組中查找重複項 - 如何提高速度？

回答

相關問題