Python collections.Counter效率

我使用下面的代碼來實現一個函數，該函數可以查找字符串s中所有字符串p的字符串。Python collections.Counter效率

class Solution(object): 
    def findAnagrams(self, s, p): 
     """ 
     :type s: str 
     :type p: str 
     :rtype: List[int] 
     """ 
     ans = list() 
     pcnt = collections.Counter(p) 
     for i in range(len(s)): 
      if collections.Counter(s[i:i+len(p)]) == pcnt: 
       ans.append(i) 
     return ans

大長度的輸入字符串s運行時

，它給了我「時間超過限制」在網上的代碼測試系統錯誤。但是，以下代碼將不會發生此類問題：

class Solution(object): 
    def findAnagrams(self, s, p): 
     """ 
     :type s: str 
     :type p: str 
     :rtype: List[int] 
     """ 
     ls, lp = len(s), len(p) 
     cp = collections.Counter(p) 
     cs = collections.Counter() 
     ans = [] 
     for i in range(ls): 
      cs[s[i]] += 1 
      if i >= lp: 
       cs[s[i - lp]] -= 1 
       if cs[s[i - lp]] == 0: 
        del cs[s[i - lp]] 
      if cs == cp: 
       ans.append(i - lp + 1) 
     return ans

我知道爲什麼嗎？看來這兩個解決方案都使用兩個len（p）最大大小的計數器？

來源

2017-04-19 yuhengd

第一個使用遠比兩個計數器多。它會在每個循環迭代中創建一個新的計數器，然後將其丟棄。不確定這是否是速度差異的原因，但它可能是。 – BrenBarn

要了解爲什麼某些代碼的運行速度比其他代碼快，您應該對其進行配置。在Python，上手紋最簡單的方法是運行：

python -m cProfile <script.py>

在我的情況，我寫了一個簡單的腳本，調用要麼慢溶液或快速的解決方案：

# Pasted code from original question. 
# Also renamed the slow version to `SlowSolution` and the fast version to `FastSolution`. 
... 

# solution = FastSolution() 
solution = SlowSolution() 

print(solution.findAnagrams('abcdefg' + 'a' * 10000, 'gfedcba' + 'a' * 10000))

然後我只用SlowSolution和FastSolution來運行腳本。下面是我的探查結果使用SlowSolution輸出：

$ python -m cProfile counter.py 
[0] 
     100204 function calls (100192 primitive calls) in 2.557 seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
    10008 0.015 0.000 2.538 0.000 __init__.py:516(__init__) 
    10008 0.009 0.000 2.522 0.000 __init__.py:585(update) 
     7 0.000 0.000 0.000 0.000 _collections_abc.py:392(__subclasshook__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:16(__init__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:20(__enter__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:26(__exit__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:36(__init__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:52(_commit_removals) 
     9 0.000 0.000 0.000 0.000 _weakrefset.py:58(__iter__) 
    20022 0.007 0.000 0.007 0.000 _weakrefset.py:70(__contains__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:81(add) 
    10008 0.010 0.000 0.017 0.000 abc.py:178(__instancecheck__) 
     7/1 0.000 0.000 0.000 0.000 abc.py:194(__subclasscheck__) 
     1 0.000 0.000 2.557 2.557 counter.py:1(<module>) 
     1 0.000 0.000 0.000 0.000 counter.py:17(FastSolution) 
     1 0.000 0.000 0.000 0.000 counter.py:3(SlowSolution) 
     1 0.017 0.017 2.556 2.556 counter.py:4(findAnagrams) 
    10008 2.490 0.000 2.490 0.000 {built-in method _collections._count_elements} 
     2 0.000 0.000 0.000 0.000 {built-in method builtins.__build_class__} 
     1 0.000 0.000 2.557 2.557 {built-in method builtins.exec} 
     7 0.000 0.000 0.000 0.000 {built-in method builtins.getattr} 
    10008 0.005 0.000 0.022 0.000 {built-in method builtins.isinstance} 
     8/2 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass} 
    30024 0.003 0.000 0.003 0.000 {built-in method builtins.len} 
     1 0.000 0.000 0.000 0.000 {built-in method builtins.print} 
     7 0.000 0.000 0.000 0.000 {method '__subclasses__' of 'type' objects} 
     14 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 
     1 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     7 0.000 0.000 0.000 0.000 {method 'remove' of 'set' objects}

和FastSolution：

$ python -m cProfile counter.py 
[0] 
     146 function calls (134 primitive calls) in 0.005 seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
     2 0.000 0.000 0.001 0.000 __init__.py:516(__init__) 
     7 0.000 0.000 0.000 0.000 __init__.py:536(__missing__) 
     2 0.000 0.000 0.001 0.000 __init__.py:585(update) 
     7 0.000 0.000 0.000 0.000 _collections_abc.py:392(__subclasshook__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:16(__init__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:20(__enter__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:26(__exit__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:36(__init__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:52(_commit_removals) 
     9 0.000 0.000 0.000 0.000 _weakrefset.py:58(__iter__) 
     8 0.000 0.000 0.000 0.000 _weakrefset.py:70(__contains__) 
     7 0.000 0.000 0.000 0.000 _weakrefset.py:81(add) 
     1 0.000 0.000 0.000 0.000 abc.py:178(__instancecheck__) 
     7/1 0.000 0.000 0.000 0.000 abc.py:194(__subclasscheck__) 
     1 0.000 0.000 0.005 0.005 counter.py:1(<module>) 
     1 0.000 0.000 0.000 0.000 counter.py:17(FastSolution) 
     1 0.004 0.004 0.005 0.005 counter.py:18(findAnagrams) 
     1 0.000 0.000 0.000 0.000 counter.py:3(SlowSolution) 
     1 0.001 0.001 0.001 0.001 {built-in method _collections._count_elements} 
     2 0.000 0.000 0.000 0.000 {built-in method builtins.__build_class__} 
     1 0.000 0.000 0.005 0.005 {built-in method builtins.exec} 
     7 0.000 0.000 0.000 0.000 {built-in method builtins.getattr} 
     1 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 
     8/2 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass} 
     6 0.000 0.000 0.000 0.000 {built-in method builtins.len} 
     1 0.000 0.000 0.000 0.000 {built-in method builtins.print} 
     7 0.000 0.000 0.000 0.000 {method '__subclasses__' of 'type' objects} 
     14 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 
     1 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     7 0.000 0.000 0.000 0.000 {method 'remove' of 'set' objects}

輸出可以是一個有點陌生，在第一次閱讀，但我們在tottime列很感興趣。這告訴我們在一個特定函數中花費了多少時間。

正如您所看到的，該腳本幾乎將所有時間花費在{built-in method _collections._count_elements}之內。這是一個Counter所使用的內部方法，我們可以推斷每次創建計數器時都會調用它（如collections.Counter(p)）。

要使代碼更快，您應該調用collections.Counter(...)更少的次數和/或更短的字符串。在緩慢版本中，你計算len(p)個字符len(s)次。這有一個運行時間爲O(sp)，它是二次的，並解釋爲什麼它在大輸入上如此緩慢。

另一方面，更快的解決方案恰好爲s的每個字符計數一次，因此它的運行時間爲O(s + p)。這要快得多，並且可以通過更大的輸入進行擴展。

對於Python的貌相的詳細信息，請參閱How can you profile a python script?

來源

2017-04-19 03:01:00 supersam654

謝謝，這真的很有幫助，這正是我需要學習的東西！ – yuhengd

有創造，計數和比較比列表Counter對象更多的開銷。這是你所看到的本質。如果你還想要一個更快的方法，你可以通過將p作爲一個元組進行排列，然後根據元組檢查s片來完成字典查找器。

class Solution(object): 
    def findAnagrams(self, s, p): 
     """ 
     :type s: str 
     :type p: str 
     :rtype: List[int] 
     """ 
     ans = list() 
     pcnt = collections.Counter(p) 
     for i in range(len(s)): 
      if collections.Counter(s[i:i+len(p)]) == pcnt: 
       ans.append(i) 
     return ans 

    def findAnagrams2(self, s, p): 
     ls, lp = len(s), len(p) 
     cp = collections.Counter(p) 
     cs = collections.Counter() 
     ans = [] 
     for i in range(ls): 
      cs[s[i]] += 1 
      if i >= lp: 
       cs[s[i - lp]] -= 1 
       if cs[s[i - lp]] == 0: 
        del cs[s[i - lp]] 
      if cs == cp: 
       ans.append(i - lp + 1) 
     return ans 

    def findAnagrams3(self, s, p): 
     p_all = tuple(''.join(x) for x in permutations(p,len(p))) 
     return [i for i in range(len(s)) if s[i:i+len(p)] in p_all]

下面是在IPython的使用timeit 3種方法很短的比較：

In [33]: %%timeit 
    ...: sol.findAnagrams('hello world he said eh', 'he') 
    ...: 
1000 loops, best of 3: 259 µs per loop 

In [34]: %%timeit 
    ...: sol.findAnagrams2('hello world he said eh', 'he') 
    ...: 
10000 loops, best of 3: 102 µs per loop 

In [35]: %%timeit 
    ...: sol.findAnagrams3('hello world he said eh', 'he') 
    ...: 
100000 loops, best of 3: 15.5 µs per loop

來源

2017-04-19 03:10:57 James

謝謝！這很有趣。一個可能的問題是當p很大時，p的排列會佔用大量的內存空間？ – yuhengd

我們在談論多大的p？ – James

Python collections.Counter效率

回答

相關問題