爲了計算從參考多邊形(Ref)到交叉區域的平均值,我在Python 2.7中編寫了一個函數(在Window OS 64bit上)以及ESRI shapefile format中的一個或多個分段(Seg)多邊形。代碼非常慢,因爲我有更多的2000參考多邊形,並且對於每個Ref_polygon函數,每次都會爲所有Seg多邊形(多於7000個)運行該函數。我很抱歉,但功能是一個原型。Python:使用多處理模塊作爲可能的解決方案來提高函數的速度
我想知道如果multiprocessing可以幫助我提高我的循環速度或有更多的性能解決方案。如果多可以是一個可行的解決方案,我想知道優化我下面的函數
import numpy as np
import ogr
import osr,gdal
from shapely.geometry import Polygon
from shapely.geometry import Point
import osgeo.gdal
import osgeo.gdal as gdal
def AreaInter(reference,segmented,outFile):
# open shapefile
ref = osgeo.ogr.Open(reference)
if ref is None:
raise SystemExit('Unable to open %s' % reference)
seg = osgeo.ogr.Open(segmented)
if seg is None:
raise SystemExit('Unable to open %s' % segmented)
ref_layer = ref.GetLayer()
seg_layer = seg.GetLayer()
# create outfile
if not os.path.split(outFile)[0]:
file_path, file_name_ext = os.path.split(os.path.abspath(reference))
outFile_filename = os.path.splitext(os.path.basename(outFile))[0]
file_out = open(os.path.abspath("{0}\\{1}.txt".format(file_path, outFile_filename)), "w")
else:
file_path_name, file_ext = os.path.splitext(outFile)
file_out = open(os.path.abspath("{0}.txt".format(file_path_name)), "w")
# For each reference objects-i
for index in xrange(ref_layer.GetFeatureCount()):
ref_feature = ref_layer.GetFeature(index)
# get FID (=Feature ID)
FID = str(ref_feature.GetFID())
ref_geometry = ref_feature.GetGeometryRef()
pts = ref_geometry.GetGeometryRef(0)
points = []
for p in xrange(pts.GetPointCount()):
points.append((pts.GetX(p), pts.GetY(p)))
# convert in a shapely polygon
ref_polygon = Polygon(points)
# get the area
ref_Area = ref_polygon.area
# create an empty list
Area_seg, Area_intersect = ([] for _ in range(2))
# For each segmented objects-j
for segment in xrange(seg_layer.GetFeatureCount()):
seg_feature = seg_layer.GetFeature(segment)
seg_geometry = seg_feature.GetGeometryRef()
pts = seg_geometry.GetGeometryRef(0)
points = []
for p in xrange(pts.GetPointCount()):
points.append((pts.GetX(p), pts.GetY(p)))
seg_polygon = Polygon(points)
seg_Area.append = seg_polygon.area
# intersection (overlap) of reference object with the segmented object
intersect_polygon = ref_polygon.intersection(seg_polygon)
# area of intersection (= 0, No intersection)
intersect_Area.append = intersect_polygon.area
# Avarage for all segmented objects (because 1 or more segmented polygons can intersect with reference polygon)
seg_Area_average = numpy.average(seg_Area)
intersect_Area_average = numpy.average(intersect_Area)
file_out.write(" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")
file_out.close()
我的多重處理答案在下面,但是的確如此,您應該找到一個更好的算法,因爲它只會線性加速(5-10倍,取決於您的計算機的功率)。 –
我個人覺得'concurrent.futures'比'multiprocessing'更容易使用('as_completed'通常比'imap_unordered'和朋友簡單]。雖然直到3.2時纔將它添加到stdlib中,但[['futures'](http://pypi.python.org/pypi/futures)是一個到2.x的完整回溯。我認爲在你的用例中,'multiprocessing'很簡單,但值得了解未來。 – abarnert
我有一篇博客文章,講述了一個類似案例,其中包含一個與Python尷尬並行的算法示例:http://timothyawiseman.wordpress.com/2012/12/21/a-really-simple-multiprocessing-python-例如/ – TimothyAWiseman