比較列表推導和顯式循環（3個數組生成器的循環速度快於1）

我做了功課，並且意外地發現了算法速度的奇怪不一致。這裏是兩個版本的代碼相同的功能bur有1個不同：在第一版本中我使用3次數組發生器來過濾一些數組，並在第二版本中我使用1 for循環與3 if語句做相同的過濾工作。比較列表推導和顯式循環（3個數組生成器的循環速度快於1）

所以，這裏的第一個版本代碼：

def kth_order_statistic(array, k): 
    pivot = (array[0] + array[len(array) - 1]) // 2 
    l = [x for x in array if x < pivot] 
    m = [x for x in array if x == pivot] 
    r = [x for x in array if x > pivot] 
    if k <= len(l): 
      return kth_order_statistic(l, k) 
    elif k > len(l) + len(m): 
      return kth_order_statistic(r, k - len(l) - len(m)) 
    else: 
      return m[0]

這裏的第二個版本代碼：

def kth_order_statistic2(array, k): 
    pivot = (array[0] + array[len(array) - 1]) // 2 
    l = [] 
    m = [] 
    r = [] 
    for x in array: 
     if x < pivot: 
      l.append(x) 
     elif x > pivot: 
      r.append(x) 
     else: 
      m.append(x) 

    if k <= len(l): 
     return kth_order_statistic2(l, k) 
    elif k > len(l) + len(m): 
     return kth_order_statistic2(r, k - len(l) - len(m)) 
    else: 
     return m[0]

IPython的輸出爲第1版：

In [4]: %%timeit 
    ...: A = range(100000) 
    ...: shuffle(A) 
    ...: k = randint(1, len(A)-1) 
    ...: order_statisctic(A, k) 
    ...: 
10 loops, best of 3: 120 ms per loop

而對於第二版本：

In [5]: %%timeit 
    ...: A = range(100000) 
    ...: shuffle(A) 
    ...: k = randint(1, len(A)-1) 
    ...: kth_order_statistic2(A, k) 
    ...: 
10 loops, best of 3: 169 ms per loop

那麼爲什麼第一個版本比第二個版本快呢？我還使用filter（）函數取代了數組生成器的第三個版本，它比第二個版本（每個循環得到218 ms）要慢。

來源

2016-09-15 KgOfHedgehogs

列表解析通常比等效的循環更快。擴展列表（您的附加功能）也可能比填寫已知大小的列表更加昂貴。 –

顯着增加你的名單大小......我認爲你會發現差距或多或少是恆定的......你正在交易空間的時間，但他們*大致*相等（發電機上有一點點的開銷/迭代器vs列表） –

@SohierDane這是我的想法。列表理解清楚創建列表的操作將會是什麼，因此可以對其執行優化。根據元素構建列表元素，Python無法對未來的計算或內存使用情況做出任何假設。 – sdsmith

讓我們來定義我們需要回答的問題的功能和timeit他們：

In [18]: def iter(): 
    l = [x for x in range(100) if x > 10] 
    ....: 

In [19]: %timeit iter() 
100000 loops, best of 3: 7.92 µs per loop 

In [20]: def loop(): 
    l = [] 
    for x in range(100): 
     if x > 10: 
      l.append(x) 
    ....: 

In [21]: %timeit loop() 
10000 loops, best of 3: 20 µs per loop 

In [22]: def loop_fast(): 
    l = [] 
    for x in range(100): 
     if x > 10: 
      pass 
    ....: 

In [23]: %timeit loop_fast() 
100000 loops, best of 3: 4.69 µs per loop

我們可以看到沒有append命令的for循環與列表理解一樣快。其實，如果我們看看字節碼，我們可以看到，在列表理解蟒蛇的情況下能夠使用名爲LIST_APPEND，而不是一個內置的字節碼指令：

負載名單：40 LOAD_FAST
裝入屬性：43 LOAD_ATTRIBUTE
調用加載的功能：（？）49 CALL_FUNCTION
卸載列表：52 POP_TOP

你可以從以前的字節碼下面的輸出中看到的想念g與列表理解和「loop_fast」函數。比較三個函數的時間很明顯，這三個函數的時間不同。

In [27]: dis.dis(iter) 
    2   0 BUILD_LIST    0 
      3 LOAD_GLOBAL   0 (range) 
      6 LOAD_CONST    1 (1) 
      9 LOAD_CONST    2 (100) 
      12 CALL_FUNCTION   2 
      15 GET_ITER 
     >> 16 FOR_ITER    24 (to 43) 
      19 STORE_FAST    0 (x) 
      22 LOAD_FAST    0 (x) 
      25 LOAD_CONST    2 (100) 
      28 COMPARE_OP    4 (>) 
      31 POP_JUMP_IF_FALSE  16 
      34 LOAD_FAST    0 (x) 
      37 LIST_APPEND   2 
      40 JUMP_ABSOLUTE   16 
     >> 43 STORE_FAST    1 (l) 
      46 LOAD_CONST    0 (None) 
      49 RETURN_VALUE 

In [28]: dis.dis(loop) 
    2   0 BUILD_LIST    0 
      3 STORE_FAST    0 (1) 

    3   6 SETUP_LOOP   51 (to 60) 
      9 LOAD_GLOBAL   0 (range) 
      12 LOAD_CONST    1 (1) 
      15 LOAD_CONST    2 (100) 
      18 CALL_FUNCTION   2 
      21 GET_ITER 
     >> 22 FOR_ITER    34 (to 59) 
      25 STORE_FAST    1 (x) 

    4   28 LOAD_FAST    1 (x) 
      31 LOAD_CONST    3 (10) 
      34 COMPARE_OP    4 (>) 
      37 POP_JUMP_IF_FALSE  22 

    5   40 LOAD_FAST    0 (l) 
      43 LOAD_ATTR    1 (append) 
      46 LOAD_FAST    1 (x) 
      49 CALL_FUNCTION   1 
      52 POP_TOP 
      53 JUMP_ABSOLUTE   22 
      56 JUMP_ABSOLUTE   22 
     >> 59 POP_BLOCK 
     >> 60 LOAD_CONST    0 (None) 
      63 RETURN_VALUE 

In [29]: dis.dis(loop_fast) 
    2   0 BUILD_LIST    0 
      3 STORE_FAST    0 (1) 

    3   6 SETUP_LOOP   38 (to 47) 
      9 LOAD_GLOBAL   0 (range) 
      12 LOAD_CONST    1 (1) 
      15 LOAD_CONST    2 (100) 
      18 CALL_FUNCTION   2 
      21 GET_ITER 
     >> 22 FOR_ITER    21 (to 46) 
      25 STORE_FAST    1 (x) 

    4   28 LOAD_FAST    1 (x) 
      31 LOAD_CONST    3 (10) 
      34 COMPARE_OP    4 (>) 
      37 POP_JUMP_IF_FALSE  22 

    5   40 JUMP_ABSOLUTE   22 
      43 JUMP_ABSOLUTE   22 
     >> 46 POP_BLOCK 
     >> 47 LOAD_CONST    0 (None) 
      50 RETURN_VALUE

來源

2016-09-15 20:17:32

使用簡單for比list comprehesion要快。它幾乎快了2倍。檢查下面的結果：

使用list comprehension：58微秒

[email protected]:~$ python -m timeit "[i for i in range(1000)]" 
10000 loops, best of 3: 58 usec per loop

使用for循環：37.1微秒

[email protected]:~$ python -m timeit "for i in range(1000): i" 
10000 loops, best of 3: 37.1 usec per loop

但在你的情況，for時間比列表的更多時間理解不是因爲你的循環很慢。但是由於.append()您正在代碼中使用。

隨着for loop` append()：114微秒

[email protected]:~$ python -m timeit "my_list = []" "for i in range(1000): my_list.append(i)" 
10000 loops, best of 3: 114 usec per loop

其中明確表明，它是.append()其服用兩次由for環所花費的時間。

然而，storing the "list.append" in different variable：69.3微秒

[email protected]:~$ python -m timeit "my_list = []; append = my_list.append" "for i in range(1000): append(i)" 
10000 loops, best of 3: 69.3 usec per loop

相比於上述比較過去的情況，並且結果是相當媲美的list comprehension有一個在性能上大爲提高。這意味着，不是每次調用my_list.append()，都可以通過將函數的參考存儲在另一個變量中，例如append_func = my_list.append，並使用該變量append_func(i)進行呼叫來改善性能。

這也證明了，與使用類的對象直接進行函數調用相比，調用存儲在變量中的類的函數更快。

謝謝Stefan爲通知最後一個案件。

來源

2016-09-15 19:32:59

一個是在大小爲n的列表上的三個單獨循環，另一個是單個循環。通過顯示「append」的價格是多少，可以做出更好的論點。 – sdsmith

@sdsmith：是的。我應該提供見解。更新了答案 –

你對'python -m timeit「my_list = []; append = my_list.append」「得到了什麼？在範圍內（1000）：append（i）」'？ –

算法結構不同，條件結構將被控告。追加到r和m中的測試可以在之前的測試中丟棄。關於for循環與append和列表理解更嚴格的比較是對非最佳以下

for x in array: 
     if x < pivot: 
      l.append(x) 
for x in array: 
     if x== pivot: 
      m.append(x) 
for x in array: 
     if x > pivot: 
      r.append(x)

來源

2016-09-15 19:46:24 Flint

讓我們消散疑問：的第二個版本稍微快一些：列表理解更快，但兩個數組循環和儘可能多的條件語句在一次迭代中被丟棄。

def kth_order_statistic1(array,k): 
    pivot = (array[0] + array[len(array) - 1]) // 2 
    l = [x for x in array if x < pivot] 
    m = [x for x in array if x == pivot] 
    r = [x for x in array if x > pivot] 

    if k <= len(l): 
     return kth_order_statistic1(l, k) 
    elif k > len(l) + len(m): 
     return kth_order_statistic1(r, k - len(l) - len(m)) 
    else: 
     return m[0] 


def kth_order_statistic2(array,k): 
    pivot = (array[0] + array[len(array) - 1]) // 2 
    l = [] 
    m = [] 
    r = [] 
    for x in array: 
     if x < pivot: 
      l.append(x) 
     elif x > pivot: 
      r.append(x) 
     else: 
      m.append(x) 

    if k <= len(l): 
     return kth_order_statistic2(l, k) 
    elif k > len(l) + len(m): 
     return kth_order_statistic2(r, k - len(l) - len(m)) 
    else: 
     return m[0] 

def kth_order_statistic3(array,k): 
    pivot = (array[0] + array[len(array) - 1]) // 2 
    l = [] 
    m = [] 
    r = [] 

    for x in array: 
     if x < pivot: l.append(x) 
    for x in array: 
     if x== pivot: m.append(x) 
    for x in array: 
     if x > pivot: r.append(x) 

    if k <= len(l): 
     return kth_order_statistic3(l, k) 
    elif k > len(l) + len(m): 
     return kth_order_statistic3(r, k - len(l) - len(m)) 
    else: 
     return m[0] 

import time 
import random 
if __name__ == '__main__': 

    A = range(100000) 
    random.shuffle(A) 
    k = random.randint(1, len(A)-1) 

    start_time = time.time() 
    for x in range(1000) : 
     kth_order_statistic1(A,k) 
    print("--- %s seconds ---" % (time.time() - start_time)) 

    start_time = time.time() 
    for x in range(1000) : 
     kth_order_statistic2(A,k) 
    print("--- %s seconds ---" % (time.time() - start_time)) 

    start_time = time.time() 
    for x in range(1000) : 
     kth_order_statistic3(A,k) 
    print("--- %s seconds ---" % (time.time() - start_time))

python : 
--- 25.8894710541 seconds --- 
--- 24.073086977 seconds --- 
--- 32.9823839664 seconds --- 

ipython 
--- 25.7450709343 seconds --- 
--- 22.7140650749 seconds --- 
--- 35.2958850861 seconds ---

的定時可以根據隨機拉伸而變化，但三者之間的差別是非常相同的。

來源

2016-09-15 20:59:53 Flint

你不認爲函數修改了數組，在這種情況下，對同一個數組的以下函數調用會減少工作函數。因此，我用隨機函數測量了時間。但是這不會影響你的結果。謝謝你的回答 – KgOfHedgehogs

什麼？不，數組不會寫回，它只用作輸入。另外，如果你給函數賦予不同的數組，那麼你會得到有偏差的結果，因爲工作量也取決於隨機分佈。 – Flint

比較列表推導和顯式循環（3個數組生成器的循環速度快於1）

回答

相關問題