優化循環。更快的ResultList.append（[c，d，c [1]/d [1]]）？陣列？地圖？

-1

以下效果很好，但我想讓它更快。實際的應用程序可以處理Tuple1和Tuple2，每個元素有30,000個元素和17個嵌套序列。我看到很多關於更快循環的問題，並且嘗試了一個沒有改進的數組和映射。優化循環。更快的ResultList.append（[c，d，c [1]/d [1]]）？陣列？地圖？

for i in Tuple1: 
print i 

((1, 2.2, 3), (2, 3.3, 4)) 
((5, 6.6, 7), (6, 7.7, 8)) 

for i in Tuple2: 
print i 

((10, 11, 12), (11, 12, 13), (12, 13, 14)) 
((20, 21, 22), (21, 22, 23), (22, 23, 24)) 

ResultList = [] 

for a in Tuple1: 
for b in Tuple2: 
    for c in a: 
    for d in b: 
    ResultList.append([c, d, c[1]/d[1]]) 
    SomeFunction() # Processes ResultList and is not a concern. 
    ResultList=[]

由SomeFunction處理的ResultList的示例。

[(1, 2.2, 3), (10, 11, 12), 0.2] 
[(1, 2.2, 3), (11, 12, 13), 0.18333333333333335] 
[(1, 2.2, 3), (12, 13, 14), 0.16923076923076924] 
[(2, 3.3, 4), (10, 11, 12), 0.3] 
[(2, 3.3, 4), (11, 12, 13), 0.27499999999999997] 
[(2, 3.3, 4), (12, 13, 14), 0.25384615384615383]

來源

2014-02-06 user3180110

'from multiprocessing import Pool'然後使用'pool.map'在整個進程池中拆分外層循環。 – Duncan

我收到「ImportError：無法導入名稱池」。假設我編碼很差，我嘗試了一個只有「import multiprocessing print multiprocessing .__ file__」的腳本，並收到相同的錯誤。 – user3180110

好消息/壞消息。好消息是我使用了多處理池，並且使用Python的速度更快。壞消息是我使用PyPy運行實際的應用程序，使用PyPy的多處理池比沒有多處理池的PyPy慢。這對我來說毫無意義，所以我必須進一步研究它。 – user3180110

從明確for循環，列表理解簡單地切換，並使用tuple代替list可以顯著加快你的操作：

from itertools import product 

ResultList = [(c, d, c[1]/d[1]) for a, b in product(t1, t2) for c, d in product(a, b)] 
#    ^Use a tuple here instead of a list

如可以通過以下timeit試驗顯示：

>>> from timeit import Timer 
>>> original_test = Timer('original_version(Tuple1, Tuple2)', '''\ 
... Tuple1 = (
...  ((1, 2.2, 3), (2, 3.3, 4)), 
...  ((5, 6.6, 7), (6, 7.7, 8)) 
...) 
... Tuple2 = (
...  ((10, 11, 12), (11, 12, 13), (12, 13, 14)), 
...  ((20, 21, 22), (21, 22, 23), (22, 23, 24)) 
...) 
... def original_version(t1, t2): 
...  ResultList = [] 
...  for a in t1: 
...   for b in t2: 
...    for c in a: 
...     for d in b: 
...      ResultList.append([c, d, c[1]/d[1]])''') 
>>> improved_test = Timer('improved_version(Tuple1, Tuple2)', '''\ 
... from itertools import product 
... Tuple1 = (
...  ((1, 2.2, 3), (2, 3.3, 4)), 
...  ((5, 6.6, 7), (6, 7.7, 8)) 
...) 
... Tuple2 = (
...  ((10, 11, 12), (11, 12, 13), (12, 13, 14)), 
...  ((20, 21, 22), (21, 22, 23), (22, 23, 24)) 
...) 
... def improved_version(t1, t2): 
...  return [(c, d, c[1]/d[1]) for a, b in product(t1, t2) for c, d in product(a, b)]''') 
>>> original_time = original_test.timeit() 
>>> improved_time = improved_test.timeit() 
>>> print 'Improved version is %{} faster'.format(
...  (original_time - improved_time)/original_time * 100 
...) 
Improved version is %30.0181954314 faster 
>>>

來源

2014-02-06 15:40:12

謝謝，但您的方法會產生不同的結果。該腳本應將四個不同的ResultLists（每個都包含六個元素）發送到SomeFunction。您的解決方案發送四個相同的ResultLists;每個包含24個元素。我將您的解決方案修改爲「ResultList = [（c，d，c [1]/d [1]）for c，d in product（a，b）]」以符合。我用「時間蟒蛇script.py」，結果如下：問題 - 0m0.050s 答案 - 0m0.051s 修改答案 - 0m0.050s 我也插你的答案在我的現實世界的應用沒有性能優勢。但也許我不明白你的答案。 – user3180110

@ user3180110：對不起！我忽略了關於'ResultList'的最後一部分是4個單獨的列表。定影。 –

沒問題。但是「修復」是否意味着你有一個解決方案？我正在研究陣列矢量化作爲一種可能的解決方案，但我沒有多少進展。也許我必須用C寫這個。 – user3180110

優化循環。更快的ResultList.append（[c，d，c [1]/d [1]]）？陣列？地圖？

回答

相關問題