2016-05-15 109 views
3

我試圖做numpy的下面,而無需使用一個循環的操作:矢量化在numpy的

  • 我有維度的矩陣X N * d和尺寸爲N的向量y y保存整數範圍從1到K.
  • 我想獲得大小爲K * d的矩陣M,其中M [i,:] = np.mean(X [y == i,:],0)

我可以在不使用循環的情況下實現嗎?

隨着循環,它會像這樣。

import numpy as np 

N=3 
d=3 
K=2 

X=np.eye(N) 
y=np.random.randint(1,K+1,N) 
M=np.zeros((K,d)) 
for i in np.arange(0,K): 
    line=X[y==i+1,:] 
    if line.size==0: 
     M[i,:]=np.zeros(d) 
    else: 
     M[i,:]=mp.mean(line,0) 

在此先感謝您。

+0

是否K == N? y的值是否獨特? –

+1

如果你顯示了一些代碼,這將是很酷的。 – Bonifacio2

+0

不,不。例如,如果K = 2,X = np.eye(3),Y = [1 2 1],我想M是[[1/2 1/2],[0 1 0]]。 – popuban

回答

3

這解決了這個問題,但創建了一箇中間K×N布爾矩陣,並且不使用內置的平均函數。在某些情況下,這可能導致性能變差或數字穩定性變差。我讓類標籤範圍從0K-1而不是1K

# Define constants 
K,N,d = 10,1000,3 

# Sample data 
Y = randint(0,K-1,N) #K-1 to omit one class to test no-examples case 
X = randn(N,d) 

# Calculate means for each class, vectorized 

# Map samples to labels by taking a logical "outer product" 
mark = Y[None,:]==arange(0,K)[:,None] 

# Count number of examples in each class  
count = sum(mark,1) 

# Avoid divide by zero if no examples 
count += count==0 

# Sum within each class and normalize 
M = (dot(mark,X).T/count).T 

print(M, shape(M), shape(mark)) 
3

代碼的基本收集特定的行關閉X和加入他們,我們有一個與NumPy在np.add.reduceat內置。因此,以此爲焦點,以矢量化方式解決問題的步驟可能如下所列 -

# Get sort indices of y 
sidx = y.argsort() 

# Collect rows off X based on their IDs so that they come in consecutive order 
Xr = X[np.arange(N)[sidx]] 

# Get unique row IDs, start positions of each unique ID 
# and their counts to be used for average calculations 
unq,startidx,counts = np.unique((y-1)[sidx],return_index=True,return_counts=True) 

# Add rows off Xr based on the slices signified by the start positions 
vals = np.true_divide(np.add.reduceat(Xr,startidx,axis=0),counts[:,None]) 

# Setup output array and set row summed values into it at unique IDs row positions 
out = np.zeros((K,d)) 
out[unq] = vals