Python：使用多處理模塊作爲可能的解決方案來提高函數的速度

爲了計算從參考多邊形（Ref）到交叉區域的平均值，我在Python 2.7中編寫了一個函數（在Window OS 64bit上）以及ESRI shapefile format中的一個或多個分段（Seg）多邊形。代碼非常慢，因爲我有更多的2000參考多邊形，並且對於每個Ref_polygon函數，每次都會爲所有Seg多邊形（多於7000個）運行該函數。我很抱歉，但功能是一個原型。Python：使用多處理模塊作爲可能的解決方案來提高函數的速度

我想知道如果multiprocessing可以幫助我提高我的循環速度或有更多的性能解決方案。如果多可以是一個可行的解決方案，我想知道優化我下面的函數

import numpy as np 
import ogr 
import osr,gdal 
from shapely.geometry import Polygon 
from shapely.geometry import Point 
import osgeo.gdal 
import osgeo.gdal as gdal 

def AreaInter(reference,segmented,outFile): 
    # open shapefile 
    ref = osgeo.ogr.Open(reference) 
    if ref is None: 
      raise SystemExit('Unable to open %s' % reference) 
    seg = osgeo.ogr.Open(segmented) 
    if seg is None: 
      raise SystemExit('Unable to open %s' % segmented) 
    ref_layer = ref.GetLayer() 
    seg_layer = seg.GetLayer() 
    # create outfile 
    if not os.path.split(outFile)[0]: 
      file_path, file_name_ext = os.path.split(os.path.abspath(reference)) 
      outFile_filename = os.path.splitext(os.path.basename(outFile))[0] 
      file_out = open(os.path.abspath("{0}\\{1}.txt".format(file_path, outFile_filename)), "w") 
    else: 
      file_path_name, file_ext = os.path.splitext(outFile) 
      file_out = open(os.path.abspath("{0}.txt".format(file_path_name)), "w") 
    # For each reference objects-i 
    for index in xrange(ref_layer.GetFeatureCount()): 
      ref_feature = ref_layer.GetFeature(index) 
      # get FID (=Feature ID) 
      FID = str(ref_feature.GetFID()) 
      ref_geometry = ref_feature.GetGeometryRef() 
      pts = ref_geometry.GetGeometryRef(0) 
      points = [] 
      for p in xrange(pts.GetPointCount()): 
       points.append((pts.GetX(p), pts.GetY(p))) 
      # convert in a shapely polygon 
      ref_polygon = Polygon(points) 
      # get the area 
      ref_Area = ref_polygon.area 
      # create an empty list    
      Area_seg, Area_intersect = ([] for _ in range(2)) 
      # For each segmented objects-j 
      for segment in xrange(seg_layer.GetFeatureCount()): 
       seg_feature = seg_layer.GetFeature(segment) 
       seg_geometry = seg_feature.GetGeometryRef() 
       pts = seg_geometry.GetGeometryRef(0) 
       points = [] 
       for p in xrange(pts.GetPointCount()): 
        points.append((pts.GetX(p), pts.GetY(p))) 
       seg_polygon = Polygon(points) 
       seg_Area.append = seg_polygon.area 
       # intersection (overlap) of reference object with the segmented object 
       intersect_polygon = ref_polygon.intersection(seg_polygon) 
       # area of intersection (= 0, No intersection) 
       intersect_Area.append = intersect_polygon.area 
      # Avarage for all segmented objects (because 1 or more segmented polygons can intersect with reference polygon) 
      seg_Area_average = numpy.average(seg_Area) 
      intersect_Area_average = numpy.average(intersect_Area) 
      file_out.write(" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n") 
    file_out.close()

來源

2013-01-07 Gianni Spear

我的多重處理答案在下面，但是的確如此，您應該找到一個更好的算法，因爲它只會線性加速（5-10倍，取決於您的計算機的功率）。 –

我個人覺得'concurrent.futures'比'multiprocessing'更容易使用（'as_completed'通常比'imap_unordered'和朋友簡單]。雖然直到3.2時纔將它添加到stdlib中，但[['futures']（http://pypi.python.org/pypi/futures）是一個到2.x的完整回溯。我認爲在你的用例中，'multiprocessing'很簡單，但值得了解未來。 – abarnert

我有一篇博客文章，講述了一個類似案例，其中包含一個與Python尷尬並行的算法示例：http://timothyawiseman.wordpress.com/2012/12/21/a-really-simple-multiprocessing-python-例如/ – TimothyAWiseman

可以使用multiprocessing包，尤其是Pool類的最佳途徑。首先創建一個完成所有你想要的循環中做的東西的功能，而這需要作爲參數只有索引：

def process_reference_object(index): 
     ref_feature = ref_layer.GetFeature(index) 
     # all your code goes here 
     return (" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")

注這不寫入文件itself-那會很麻煩，因爲你有多個進程同時寫入同一個文件。相反，它返回需要寫入的字符串。還要注意，這個函數中有一些對象，如ref_layer或ref_geometry需要以某種方式達到它 - 這取決於你如何做到這一點（你可以把process_reference_object作爲類中的方法初始化它們，或者它可以是醜陋的就像在全球定義它們一樣）。

然後，在創建過程中的資源池，並使用Pool.imap_unordered（這本身將各指標分配給需要一個不同的進程）中運行所有的指標：

from multiprocessing import Pool 
p = Pool() # run multiple processes 
for l in p.imap_unordered(process_reference_object, range(ref_layer.GetFeatureCount())): 
    file_out.write(l)

這將並行獨立處理跨多個進程的參考對象，並將它們寫入文件（以任意順序，註釋）。

來源

2013-01-07 19:10:42

謝謝大衛，我真的很感激。我誠實地說我沒有得到「＃你所有的代碼都在這裏」的部分。我是否需要將函數重新編寫爲非循環版本（例如：僅適用於一個參考多邊形）？再次感謝您的幫助 –

@Gianni：也就是說，您的for循環中的所有代碼，從＃獲取FID（= Feature ID）到'intersect_Area_average'（我不想複製和粘貼全部）。 –

親愛的羅伯特·我仍下落不明如何嵌套在「高清process_reference_object（指數）」之後＃對於每一個參考對象，我中的xrange指數（ref_layer.GetFeatureCount（））：......對不起：P –

線程可以提供一定程度的幫助，但首先應確保不能簡化算法。如果您正在檢查2000個參考多邊形與7000個分段多邊形（可能是我誤解了），那麼您應該從那裏開始。在O（n ）運行的東西會變得很慢，所以也許你可以剪掉那些絕對不會相交的東西，或者找到其他方法來加快速度。否則，運行多個進程或線程只會在數據以幾何級數增長時線性改善。

來源

2013-01-07 19:12:05

Python：使用多處理模塊作爲可能的解決方案來提高函數的速度

回答

相關問題