2015-09-29 90 views
3

我正在處理2D地理數據。我有一長串輪廓路徑。現在我想爲我的域中的每個點確定它所在的輪廓(即我想計算由輪廓表示的特徵的空間頻率分佈)。計算每個場點在輪廓內的頻率

爲了說明什麼,我想做的事,這裏的第一個很天真的實現:

import numpy as np 
from shapely.geometry import Polygon, Point 

def comp_frequency(paths,lonlat): 
    """ 
    - paths: list of contour paths, made up of (lon,lat) tuples 
    - lonlat: array containing the lon/lat coordinates; shape (nx,ny,2) 
    """ 
    frequency = np.zeros(lonlat.shape[:2]) 
    contours = [Polygon(path) for path in paths] 

    # Very naive and accordingly slow implementation 
    for (i,j),v in np.ndenumerate(frequency): 
     pt = Point(lonlat[i,j,:]) 
     for contour in contours: 
      if contour.contains(pt): 
       frequency[i,j] += 1 

    return frequency 

lon = np.array([ 
    [-1.10e+1,-7.82+0,-4.52+0,-1.18+0, 2.19e+0,5.59e+0,9.01+0,1.24+1,1.58+1,1.92+1,2.26+1], 
    [-1.20e+1,-8.65+0,-5.21+0,-1.71+0, 1.81e+0,5.38e+0,8.97+0,1.25+1,1.61+1,1.96+1,2.32+1], 
    [-1.30e+1,-9.53+0,-5.94+0,-2.29+0, 1.41e+0,5.15e+0,8.91+0,1.26+1,1.64+1,2.01+1,2.38+1], 
    [-1.41e+1,-1.04+1,-6.74+0,-2.91+0, 9.76e-1,4.90e+0,8.86+0,1.28+1,1.67+1,2.06+1,2.45+1], 
    [-1.53e+1,-1.15+1,-7.60+0,-3.58+0, 4.98e-1,4.63e+0,8.80+0,1.29+1,1.71+1,2.12+1,2.53+1], 
    [-1.66e+1,-1.26+1,-8.55+0,-4.33+0,-3.00e-2,4.33e+0,8.73+0,1.31+1,1.75+1,2.18+1,2.61+1], 
    [-1.81e+1,-1.39+1,-9.60+0,-5.16+0,-6.20e-1,3.99e+0,8.66+0,1.33+1,1.79+1,2.25+1,2.70+1], 
    [-1.97e+1,-1.53+1,-1.07+1,-6.10+0,-1.28e+0,3.61e+0,8.57+0,1.35+1,1.84+1,2.33+1,2.81+1], 
    [-2.14e+1,-1.69+1,-1.21+1,-7.16+0,-2.05e+0,3.17e+0,8.47+0,1.37+1,1.90+1,2.42+1,2.93+1], 
    [-2.35e+1,-1.87+1,-1.36+1,-8.40+0,-2.94e+0,2.66e+0,8.36+0,1.40+1,1.97+1,2.52+1,3.06+1], 
    [-2.58e+1,-2.08+1,-1.54+1,-9.86+0,-3.99e+0,2.05e+0,8.22+0,1.44+1,2.05+1,2.65+1,3.22+1]]) 

lat = np.array([ 
    [ 29.6, 30.3, 30.9, 31.4, 31.7, 32.0, 32.1, 32.1, 31.9, 31.6, 31.2], 
    [ 32.4, 33.2, 33.8, 34.4, 34.7, 35.0, 35.1, 35.1, 34.9, 34.6, 34.2], 
    [ 35.3, 36.1, 36.8, 37.3, 37.7, 38.0, 38.1, 38.1, 37.9, 37.6, 37.1], 
    [ 38.2, 39.0, 39.7, 40.3, 40.7, 41.0, 41.1, 41.1, 40.9, 40.5, 40.1], 
    [ 41.0, 41.9, 42.6, 43.2, 43.7, 44.0, 44.1, 44.0, 43.9, 43.5, 43.0], 
    [ 43.9, 44.8, 45.6, 46.2, 46.7, 47.0, 47.1, 47.0, 46.8, 46.5, 45.9], 
    [ 46.7, 47.7, 48.5, 49.1, 49.6, 49.9, 50.1, 50.0, 49.8, 49.4, 48.9], 
    [ 49.5, 50.5, 51.4, 52.1, 52.6, 52.9, 53.1, 53.0, 52.8, 52.4, 51.8], 
    [ 52.3, 53.4, 54.3, 55.0, 55.6, 55.9, 56.1, 56.0, 55.8, 55.3, 54.7], 
    [ 55.0, 56.2, 57.1, 57.9, 58.5, 58.9, 59.1, 59.0, 58.8, 58.3, 57.6], 
    [ 57.7, 59.0, 60.0, 60.8, 61.5, 61.9, 62.1, 62.0, 61.7, 61.2, 60.5]]) 

lonlat = np.dstack((lon,lat)) 

paths = [ 
    [(-1.71,34.4),(1.81,34.7),(5.15,38.0),(4.9,41.0),(4.63,44.0),(-0.03,46.7),(-4.33,46.2),(-9.6,48.5),(-8.55,45.6),(-3.58,43.2),(-2.91,40.3),(-2.29,37.3),(-1.71,34.4)], 
    [(0.976,40.7),(-4.33,46.2),(-0.62,49.6),(3.99,49.9),(4.33,47.0),(4.63,44.0),(0.976,40.7)], 
    [(2.9,55.8),(2.37,56.0),(8.47,56.1),(3.17,55.9),(-2.05,55.6),(-1.28,52.6),(-0.62,49.6),(4.33,47.0),(8.8,44.1),(2.29,44.0),(2.71,43.9),(3.18,46.5),(3.25,49.4),(3.33,52.4),(2.9,55.8)], 
    [(2.25,35.1),(2.26,38.1),(8.86,41.1),(5.15,38.0),(5.38,35.0),(9.01,32.1),(2.25,35.1)]] 

frequency = comp_frequency(paths,lonlat) 

當然,這大約是低效寫越好,所有的顯式循環,並相應地需要永遠。

我該如何有效地做到這一點?

編輯:根據要求添加了一些示例數據。請注意,由於我通過對原始數組進行切片來創建樣本座標,所以我的實際域大小爲150 ** 2(分辨率方面):lon[::150]

+0

你可以添加一些數據到你的文章?我的意思是路徑和長條形碼 – LetzerWille

+0

@LetzerWille:好的,我添加了一些數據(縮減的座標陣列和一些構成的輪廓)。 – flotzilla

+0

您是否擁有輪廓生成的原始網格?在這種情況下,對網格進行採樣並推斷其內部的輪廓比計算多邊形交點要高效得多。 –

回答

1

所以同時我已經找到了一個很好的解決方案感謝誰的執行在一些點(基於this blog post)類似的一個同事。

老,身材勻稱的使用

首先非常緩慢的辦法,這裏是用我的身材勻稱的參考實現,這僅僅是在開封后我的第一個「天真」的做法有點改良版本。這很容易理解和運作,但速度很慢。

import numpy as np 
from shapely.geometry import Polygon, Point 

def spatial_contour_frequency_shapely(paths,lon,lat): 

    frequency = np.zeros(lon.shape) 
    contours = [Polygon(path) for path in paths] 

    for (i,j),v in np.ndenumerate(frequency): 
     pt = Point([lon[i,j],lat[i,j]]) 
     for contour in contours: 
      if contour.contains(pt): 
       frequency[i,j] += 1 

    return frequency 

新的,使用PIL

我(almost-)最終解決方案不使用身材勻稱了,而是使用圖像處理技術,從PIL(Python圖像庫)非常快速的解決方案。這種解決方案要快得多,儘管這種形式只適用於常規的矩形網格(見下面的註釋)。

import numpy as np 
from PIL import Image, ImageDraw 

def _spatial_contour_frequency_pil(paths,lon,lat,regular_grid=False, 
     method_ind=None): 

    def get_indices(points,lon,lat,tree=None,regular=False): 

     def get_indices_regular(points,lon,lat): 
      lon,lat = lon.T,lat.T 
      def _get_ij(lon,lat,x,y): 
       lon0 = lon[0,0] 
       lat0 = lat[0,0] 
       lon1 = lon[-1,-1] 
       lat1 = lat[-1,-1] 
       nx,ny = lon.shape 
       dx = (lon1-lon0)/nx 
       dy = (lat1-lat0)/ny 
       i = int((x-lon0)/dx) 
       j = int((y-lat0)/dy) 
       return i,j 
      return [_get_ij(lon,lat,x,y) for x,y in points] 

     def get_indices_irregular(points,tree,shape): 

      dist,idx = tree.query(points,k=1) 
      ind = np.column_stack(np.unravel_index(idx,lon.shape)) 
      return [(i,j) for i,j in ind] 

     if regular: 
      return get_indices_regular(points,lon,lat) 
     return get_indices_irregular(points,tree,lon.T.shape) 

    tree = None 
    if not regular_grid: 
     lonlat = np.column_stack((lon.T.ravel(),lat.T.ravel())) 
     tree = sp.spatial.cKDTree(lonlat) 

    frequency = np.zeros(lon.shape) 
    for path in paths: 
     path_ij = get_indices(path,lon,lat,tree=tree,regular=regular_grid) 
     raster_poly = Image.new("L",lon.shape,0) 
     rasterize = ImageDraw.Draw(raster_poly) 
     rasterize.polygon(path_ij,1) 
     mask = sp.fromstring(raster_poly.tobytes(),'b') 
     mask.shape = raster_poly.im.size[1],raster_poly.im.size[0] 
     frequency += mask 

    return frequency 

應當指出的是,這兩種方法的結果是不exaclty相同。使用PIL方法確定的特徵略大於用勻稱方法確定的特徵,但實際上一個並不比另一個更好。

計時

下面是具有減小的數據集創建一些定時(不從所述開口後的半人工數據。例如,雖然):

spatial_contour_frequency/shapely    : 191.8843 
spatial_contour_frequency/pil     :  0.3287 
spatial_contour_frequency/pil-tree-inside  :  2.3629 
spatial_contour_frequency/pil-regular_grid :  0.3276 

最耗時的步驟是在輪廓點的不規則長/緯網格上查找索引。其中最耗時的部分是cKDTree的構建,這就是爲什麼我將它從get_indices中移出的原因。爲了對此進行透視,pil-tree-inside是在get_indices內創建樹的版本。 pil-regular-gridregular_grid=True一致,對於我的數據集,這會產生錯誤的結果,但會給出在規則網格上運行速度有多快的問題。

總體而言,現在我已經設法消除非規則網格(pilpil-regular-grid)的影響,這是我一開始就希望的! :)

0
def comp_frequency_lc(paths,lonlat): 

    import operator 
    frequency = np.zeros(lonlat.shape[:2]) 
    contours = [Polygon(path) for path in paths] 

    for (i,j),v in np.ndenumerate(frequency): 
     pt = Point(lonlat[i,j,:]) 
     [ 
      operator.setitem(frequency,(i,j), 
        operator.getitem(frequency,(i,j))+1) 
      for contour in contours if contour.contains(pt) 
     ] 

    return frequency 

    print(comp_frequency(paths,lonlat)) 

**result in**: 

[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 
[ 0. 0. 0. 0. 1. 0. 0. 1. 2. 2. 2.] 
[ 0. 1. 0. 0. 1. 0. 0. 1. 1. 1. 1.] 
[ 0. 2. 0. 0. 2. 0. 0. 2. 2. 2. 1.] 
[ 0. 2. 0. 0. 1. 0. 0. 1. 1. 1. 2.] 
[ 0. 1. 0. 0. 0. 0. 0. 1. 2. 1. 1.] 
[ 0. 1. 1. 0. 0. 0. 0. 1. 1. 0. 0.] 
[ 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0.] 
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] 

,這裏是來自時序....那裏comp_frequency_lc是結果一個 使用列表理解

# 10000 runs 
# comp_frequency 185.10419257196094 
# comp_frequency_lc 124.42639064462102 
+0

謝謝,明天我會試試看,但我有點懷疑這會產生巨大的效果,因爲原則上保留了顯式循環......我想我需要一個很大的我真的需要一個巨大的加速,因爲對於原始數據量我最初的「解決方案」可能需要幾個小時(我已經在20分鐘後中止了它)。 – flotzilla

+0

我現在已經開始比較(主要是爲了測試我的初始版本的一個更好的版本),並且也測試了你的。但是我沒有得到任何加速,而是一個小的放緩,從10000到74.3秒重複10000次!也嘗試將for循環包括在「頻率」中以理解,但是,恩,我甚至放緩到154。看起來我可以安全地放棄這種方法。 (當然要感謝你的建議!) – flotzilla

4

如果輸入的多邊形實際上是輪廓,那麼你最好直接與工作你的輸入網格比計算輪廓和測試點是否在他們的內部。

輪廓遵循網格數據的恆定值。每個輪廓都是一個多邊形,封閉了大於該值的輸入網格區域。

如果您需要知道給定點的內部有多少個等值線,那麼在該點的位置對輸入網格進行採樣並操作返回的「z」值會更快。如果您知道自己創建的輪廓線的值,則可以直接從中提取其內部的輪廓線數量。

例如:

import numpy as np 
from scipy.interpolate import RegularGridInterpolator 
import matplotlib.pyplot as plt 

# One of your input gridded datasets 
y, x = np.mgrid[-5:5:100j, -5:5:100j] 
z = np.sin(np.hypot(x, y)) + np.hypot(x, y)/10 

contour_values = [-1, -0.5, 0, 0.5, 1, 1.5, 2] 

# A point location... 
x0, y0 = np.random.normal(0, 2, 2) 

# Visualize what's happening... 
fig, ax = plt.subplots() 
cont = ax.contourf(x, y, z, contour_values, cmap='gist_earth') 
ax.plot([x0], [y0], marker='o', ls='none', color='salmon', ms=12) 
fig.colorbar(cont) 

# Instead of working with whether or not the point intersects the 
# contour polygons we generated, we'll turn the problem on its head: 

# Sample the grid at the point location 
interp = RegularGridInterpolator((x[0,:], y[:,0]), z) 
z0 = interp([x0, y0]) 

# How many contours would the point be inside? 
num_inside = sum(z0 > c for c in contour_values)[0] 

ax.set(title='Point is inside {} contours'.format(num_inside)) 
plt.show() 

enter image description here

+0

感謝這種替代方法,一旦我在網上做了所有的分析,我會在將來某個時候進行分析,這可能會派上用場。儘管如此,它並沒有真正幫助解決手頭的問題。爲了做我目前正在做的靈活的分析,我必須保存一個完整的2D場(二進制可以實現,但仍然是)基本上每個輪廓,並且這對於給定數據量是不可行的。然後,我寧願只保存輪廓座標(就像我在做的那樣),然後等待更長的時間以完成分析。 – flotzilla

+1

@flotzilla - 您不需要爲每個輪廓保存它。相反,您要對輪廓生成的數據進行採樣。 –