2017-03-24 57 views
2

給定兩個矩陣,我想計算所有行之間的成對差異。每個矩陣有1000行和100列,因此它們相當大。我嘗試使用for循環和純粹的廣播,但for循環似乎工作得更快。難道我做錯了什麼?下面是代碼:如何找到使用numpy的兩個非常大的矩陣行之間的成對差異?

from numpy import * 
A = random.randn(1000,100) 
B = random.randn(1000,100) 

start = time.time() 
for a in A: 
    sum((a - B)**2,1) 
print time.time() - start 

# pure broadcasting 
start = time.time() 
((A[:,newaxis,:] - B)**2).sum(-1) 
print time.time() - start 

廣播方法需要大約1秒鐘時間更長,它甚至更長的大型矩陣。任何想法如何加速純粹使用numpy?

回答

4

另一個工作下面就來進行另一種方式:

(AB)^ 2 = A^2 + B^2 - 2AB

np.einsum前兩個術語和dot-product爲第三個 -

import numpy as np 

np.einsum('ij,ij->i',A,A)[:,None] + np.einsum('ij,ij->i',B,B) - 2*np.dot(A,B.T) 

運行測試

途徑 -

def loopy_app(A,B): 
    m,n = A.shape[0], B.shape[0] 
    out = np.empty((m,n)) 
    for i,a in enumerate(A): 
     out[i] = np.sum((a - B)**2,1) 
    return out 

def broadcasting_app(A,B): 
    return ((A[:,np.newaxis,:] - B)**2).sum(-1) 

# @Paul Panzer's soln 
def outer_sum_dot_app(A,B): 
    return np.add.outer((A*A).sum(axis=-1), (B*B).sum(axis=-1)) - 2*np.dot(A,B.T) 

# @Daniel Forsman's soln 
def einsum_all_app(A,B): 
    return np.einsum('ijk,ijk->ij', A[:,None,:] - B[None,:,:], \ 
             A[:,None,:] - B[None,:,:]) 

# Proposed in this post 
def outer_einsum_dot_app(A,B): 
    return np.einsum('ij,ij->i',A,A)[:,None] + np.einsum('ij,ij->i',B,B) - \ 
                  2*np.dot(A,B.T) 

計時 -

In [51]: A = np.random.randn(1000,100) 
    ...: B = np.random.randn(1000,100) 
    ...: 

In [52]: %timeit loopy_app(A,B) 
    ...: %timeit broadcasting_app(A,B) 
    ...: %timeit outer_sum_dot_app(A,B) 
    ...: %timeit einsum_all_app(A,B) 
    ...: %timeit outer_einsum_dot_app(A,B) 
    ...: 
10 loops, best of 3: 136 ms per loop 
1 loops, best of 3: 302 ms per loop 
100 loops, best of 3: 8.51 ms per loop 
1 loops, best of 3: 341 ms per loop 
100 loops, best of 3: 8.38 ms per loop 
+0

哈!我毆打Divakar到'einsum'答案!當然,如果你想要更快的解決方案,我的方法似乎是錯誤的。 。 。 –

+0

@DanielForsman你只需要更多的練習,你會最終到達那裏! :) – Divakar

+0

考慮到我目前多少代碼依賴於快速計算笛卡爾距離,這個代數技巧非常有用,所以我沒有多少頭腦:) –

2

這裏是避免了兩者的環和大中間體的解決方案:

from numpy import * 
import time 

A = random.randn(1000,100) 
B = random.randn(1000,100) 

start = time.time() 
for a in A: 
    sum((a - B)**2,1) 
print time.time() - start 

# pure broadcasting 
start = time.time() 
((A[:,newaxis,:] - B)**2).sum(-1) 
print time.time() - start 

#matmul 
start = time.time() 
add.outer((A*A).sum(axis=-1), (B*B).sum(axis=-1)) - 2*dot(A,B.T) 
print time.time() - start 

打印:

0.546781778336 
0.674743175507 
0.10723400116 
2

np.einsum

np.einsum('ijk,ijk->ij', A[:,None,:] - B[None,:,:], A[:,None,:] - B[None,:,:]) 
相關問題