2017-01-18 40 views
3

我有一個代碼來計算成對距離和我的數據(X,Y,Z)的殘差。數據非常大(平均7000行),所以我的興趣是代碼效率。我最初的代碼是配對距離和殘差計算優化

import tkinter as tk 
from tkinter import filedialog 
import pandas as pd 
import, numpy as np 
from scipy.spatial.distance import pdist, squareform 

root = tk.Tk() 
root.withdraw() 
file_path = filedialog.askopenfilename() 

data = pd.read_excel(file_path) 
data = np.array(data, dtype=np.float) 
npoints, cols = data.shape 

pwdistance = np.zeros((npoints, npoints)) 
pwresidual = np.zeros((npoints, npoints)) 
for i in range(npoints): 
    for j in range(npoints): 
     pwdistance[i][j] = np.sqrt((data[:,0][i]-data[:,0][j])**2 + (data[:,1][i]-data[:,1][j])**2) 
     pwresidual[i][j] = (data[:,2][i]-data[:,2][j])**2 

隨着pwdistance,我把它改成以下,低於該作品非常好。

pwdistance = squareform(pdist(data[:,:2])) 

有沒有計算我pwresidual的Python的方式,所以我並不需要使用一個循環,使我的代碼運行得更快?

+0

你可以使用'的np.hypot'代替'np.sqrt'和'** 2' –

+0

@FranciscoCouzo看來OP是要求獲得/優化'pwresidual'。我第一次感到困惑:) – Divakar

+0

@Divakar這就是爲什麼我發表評論而不是回答:) –

回答

1

一種方法是擴展data的第二列分片的尺寸以形成2D陣列並從其中減去1D分片本身。這些減法將按照broadcasting的規則以矢量化方式執行。

因此,簡單地做 -

pwresidual = (data[:,2,None] - data[:,2])**2 

步驟一步的奔跑 -

In [132]: data[:,2,None].shape # Slice extended to a 2D array 
Out[132]: (4, 1) 

In [133]: data[:,2].shape # Slice as 1D array 
Out[133]: (4,) 

In [134]: data[:,2,None] - data[:,2] # Subtractions with broadcasting 
Out[134]: 
array([[ 0.  , 0.67791602, 0.13298141, 0.61579315], 
     [-0.67791602, 0.  , -0.54493461, -0.06212288], 
     [-0.13298141, 0.54493461, 0.  , 0.48281174], 
     [-0.61579315, 0.06212288, -0.48281174, 0.  ]]) 

In [137]: (data[:,2,None] - data[:,2]).shape # Verify output shape 
Out[137]: (4, 4) 

In [138]: (data[:,2,None] - data[:,2])**2 # Finally elementwise square 
Out[138]: 
array([[ 0.  , 0.45957013, 0.01768406, 0.3792012 ], 
     [ 0.45957013, 0.  , 0.29695373, 0.00385925], 
     [ 0.01768406, 0.29695373, 0.  , 0.23310717], 
     [ 0.3792012 , 0.00385925, 0.23310717, 0.  ]])