這是一個有點晚,但也許這個答案將是你對別人有用,如果不是......
我這樣做與numpy的和熊貓,這是相當快。我使用的是TLS數據,可以在沒有任何麻煩的情況下,在數百萬個數據點上做到這一點。關鍵是將數據四捨五入,然後使用Pandas的GroupBy方法進行聚合並計算平均值。
如果你需要舍入到10的冪,你可以使用np.round,否則你可以通過修改this SO answer來完成這個任務。
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round(np.array(a, dtype=float)/round_val) * round_val
# load data
data = np.load('shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty([n_d, 5])
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val(d_round[:,0], 0.5)
d_round[:,4] = round_to_val(d_round[:,1], 0.5)
# sorting data
ind = np.lexsort((d_round[:,4], d_round[:,3]))
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame(d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array([x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
當我做這個,我已保存的輸出作爲.npy並轉換爲TIFF與圖片庫,然後在ArcMap地理參考。可能有一種方法可以用osgeo做到這一點,但我沒有使用它。
希望這可以幫助別人,至少...
我上面添加一些代碼,我寫了 –
在這裏,你將不得不使用雖'np.histogram2d'。 – letmaik