方法一:全油門
內部嵌套循環產生這種每次迭代 - cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))))
,這是總結了反覆給我們AF
最終輸出。
讓我們打電話exp(....
部分爲B
。現在,B
主要有兩個部分,一個是標量(2*pi*1j/lambda)
和從依賴於 原糊塗版本中使用的四個迭代變量構成的另一部分 (xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))
- 。爲便於以後參考,我們將其他部分稱爲C
。
讓我們把所有的進入角度:
糊塗的版本有AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))))
,這是相當於現在AF(i,j) = AF(i,j) + cmpWeights(q,p)*B
,其中B = exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))))
。
B
可以簡化爲B = exp((2*pi*1j/lambda)* C)
,其中C = (xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))
。
C
將取決於迭代器 -。
因此,移植到一個量化的方式之後,它最終會因爲這 -
%// 1) Define vectors corresponding to iterators used in the loopy version
I = 1:phiCount;
J = 1:thetaCount;
P = 1:M;
Q = 1:N;
%// 2) Create vectorized version of C using all four vector iterators
mult1 = bsxfun(@times,sin(thetas(J)),cos(phis(I)).'); %//'
mult2 = bsxfun(@times,sin(thetas(J)),sin(phis(I)).'); %//'
mult1_xm = bsxfun(@times,mult1(:),permute(xm,[1 3 2]));
mult2_yn = bsxfun(@times,mult2(:),yn);
C_vect = bsxfun(@plus,mult1_xm,mult2_yn);
%// 3) Create vectorized version of B using vectorized C
B_vect = reshape(exp((2*pi*1j/lambda)*C_vect),phiCount*thetaCount,[]);
%// 4) Final output as matrix multiplication between vectorized versions of B and C
AF_vect = reshape(B_vect*cmpWeights(:),phiCount,thetaCount);
方法2:更少的內存密集型
這第二種方法會減少內存流量,並使用指數分佈式屬性 - exp(A+B) = exp(A)*exp(B)
。
現在,原來糊塗的版本是這樣的 -
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*...
(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))))
因此,使用分配律後,我們會像這樣的東西endup -
K = (2*pi*1j/lambda)
part1 = K*xm(p)*sin(thetas(j))*cos(phis(i));
part2 = K*yn(q)*sin(thetas(j))*sin(phis(i));
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp(part1)*exp(part2);
因此,相關的量化方法將成爲這樣的東西 -
%// 1) Define vectors corresponding to iterators used in the loopy version
I = 1:phiCount;
J = 1:thetaCount;
P = 1:M;
Q = 1:N;
%// 2) Define the constant used at the start of EXP() call
K = (2*pi*1j/lambda);
%// 3) Perform the sine-cosine operations part1 & part2 in vectorized manners
mult1 = K*bsxfun(@times,sin(thetas(J)),cos(phis(I)).'); %//'
mult2 = K*bsxfun(@times,sin(thetas(J)),sin(phis(I)).'); %//'
%// Perform exp(part1) & exp(part2) in vectorized manners
part1_vect = exp(bsxfun(@times,mult1(:),xm));
part2_vect = exp(bsxfun(@times,mult2(:),yn));
%// Perform multiplications with cmpWeights for final output
AF = reshape(sum((part1_vect*cmpWeights.').*part2_vect,2),phiCount,[])
快速標杆
下面是與原始糊塗的做法的問題列出的輸入數據運行時和提出的方法#2 -
---------------------------- With Original Approach
Elapsed time is 358.081507 seconds.
---------------------------- With Proposed Approach #2
Elapsed time is 0.405038 seconds.
的運行時建議用一個瘋狂的性能改進Approach #2
!