我有兩個數據文件,每個文件都包含的3維點一個大數目(文件A存儲約50000點,文件B存儲大約50萬個點)。我的目標是爲文件A中的每個點(a)找出文件B中與(a)具有最小距離的點(b)。我在兩個列表點存儲這樣的:查找最近的三維點
列表A 節點:
(ID X Y Z)
[ ['478277', -107.0, 190.5674, 128.1634],
['478279', -107.0, 190.5674, 134.0172],
['478282', -107.0, 190.5674, 131.0903],
['478283', -107.0, 191.9798, 124.6807],
... ]
列表B 數據:
(X Y Z Data)
[ [-28.102, 173.657, 229.744, 14.318],
[-28.265, 175.549, 227.824, 13.648],
[-27.695, 175.925, 227.133, 13.142],
...]
我的第一種方法是簡單地通過第一循環和第二列表嵌套循環,並計算這樣每點之間的距離:
outfile = open(job[0] + '/' + output, 'wb');
dist_min = float(job[5]);
dist_max = float(job[6]);
dists = [];
for node in nodes:
shortest_distance = 1000.0;
shortest_data = 0.0;
for entry in data:
dist = math.sqrt((node[1] - entry[0])**2 + (node[2] - entry[1])**2 + (node[3] - entry[2])**2);
if (dist_min <= dist <= dist_max) and (dist < shortest_distance):
shortest_distance = dist;
shortest_data = entry[3];
outfile.write(node[0] + ', ' + str('%10.5f' % shortest_data + '\n'));
outfile.close();
我認爲,Python有運行循環的量太大了(〜250億),所以我不得不來固定我的代碼。我想先計算與列表理解所有的距離,但代碼仍然太慢:
p_x = [row[1] for row in nodes];
p_y = [row[2] for row in nodes];
p_z = [row[3] for row in nodes];
q_x = [row[0] for row in data];
q_y = [row[1] for row in data];
q_z = [row[2] for row in data];
dx = [[(px - qx) for px in p_x] for qx in q_x];
dy = [[(py - qy) for py in p_y] for qy in q_y];
dz = [[(pz - qz) for pz in p_z] for qz in q_z];
dx = [[dxxx * dxxx for dxxx in dxx] for dxx in dx];
dy = [[dyyy * dyyy for dyyy in dyy] for dyy in dy];
dz = [[dzzz * dzzz for dzzz in dzz] for dzz in dz];
D = [[(dx[i][j] + dy[i][j] + dz[i][j]) for j in range(len(dx[0]))] for i in range(len(dx))];
D = [[(DDD**(0.5)) for DDD in DD] for DD in D];
說實話,在這一點上,我不知道這兩種方法的比較好,反正沒有一個兩種可能性似乎可行。我甚至不確定是否可以編寫一個代碼,在可接受的時間內計算所有距離。是否有另一種方法來解決我的問題,而不計算所有的距離?
編輯:我忘了提,我上的Python 2.5.1運行,我不可以安裝或添加任何新的圖書館...