我有一個矩形（不能被認爲是正方形）熊貓數據框的數字。假設我選擇一個對角線方向（「左上角至右下角」或「右上角至左下角」）。我想計算一系列的條目，它們是來自原始DataFrame中沿着所選平行對角線集合的值的總和。要完全指定目標，您需要確定對角線是左側「錨定」還是右側「錨定」。對於下面，我假設他們「錨定」在左邊。直接生成Numpy/Pandas中所有平行對角線總和的方法？

我可以做到這一點沒有太多的麻煩：

import numpy as np 
import pandas as pd 

rectdf = pd.DataFrame(np.arange(15).reshape(5,3)) 

# result: 
    0 1 2 
0 0 1 2 
1 3 4 5 
2 6 7 8 
3 9 10 11 
4 12 13 14

我可以計算出「upperleft到lowerright」對角和如下：

ullrsums = pd.concat([rectdf.iloc[:, i].shift(-i) for i in range(rectdf.shape[1])], axis=1)\ 
    .sum(axis=1, fillna=0) 

# result: 
0 12 
1 21 
2 30 
3 22 
4 12

而且我可以計算出「upperright到lowerleft 「通過翻轉shift(-i)至shift(i)之前的對角線總和：

urllsums = pd.concat([rectdf.iloc[:, i].shift(i) for i in range(rectdf.shape[1])], axis=1)\ 
    .sum(axis=1, fillna=0) 

# result: 
0  0 
1  4 
2 12 
3 21 
4 30

這些結果都是正確的（即此代碼做我想要的）。有沒有更直接的方法來計算這些熊貓或Numpy的總和？

來源

2016-01-28 8one6

相關：http://stackoverflow.com/q/10792897和http://stackoverflow.com/q/28917414 –

你可能會尋找numpy.trace()，記錄here，獲得直接的痕跡，或numpy.diagonal()獲得對角線矢量，documented here

首先，你的數據幀轉換爲使用rectdf.as_matrix()

然後，numpy的矩陣：

np.trace(matrix, offset)

偏移量可以是正數也可以是負數，是否需要進行移位。

例如，如果我們這樣做：

a = np.arange(15).reshape(5, 3) 
for x in range(-4, 3): print np.trace(a, x)

我們得到的輸出：

要爲通用矩陣做到這一點，我們要的範圍從-(rows - 1)到columns，也就是說，如果我們有變量rows和變量columns：

a = np.arange(rows * columns).reshape(rows, columns) 
for x in range(-(rows - 1), columns): print np.trace(a, x)

來源

2016-01-28 16:46:17

簡短的回答

看到最後的快速，但複雜的功能。

發展

遍歷所有trace是好的，但我不知道它是比大熊貓更好的解決方案。兩者都涉及迭代 - 對角線或列。從概念上講，它更簡單或更清潔，但我不確定速度，特別是在大型陣列上。

每個對角線有不同的長度，[[12],[9,13],...]。這是一個大紅旗，警告我們，如果不是不可能的話，塊陣列操作是困難的。

隨着scipy.sparse我可以構造的2D陣列，可以被求和以得到這些痕跡：

In [295]: from scipy import sparse 
In [296]: xs=sparse.dia_matrix(x) 
In [297]: xs.data 
Out[297]: 
array([[12, 0, 0], 
     [ 9, 13, 0], 
     [ 6, 10, 14], 
     [ 3, 7, 11], 
     [ 0, 4, 8], 
     [ 0, 1, 5], 
     [ 0, 0, 2]]) 
In [298]: np.sum(xs.data,axis=1) 
Out[298]: array([12, 22, 30, 21, 12, 6, 2])

此稀疏格式在一個二維數組存儲其data，以及必要的偏移。其實你pd.concat產生類似的東西：

data[row_indices, col_indices] = x.ravel()

類似：

In [344]: i=[4,5,6,3,4,5,2,3,4,1,2,3,0,1,2] 
In [345]: j=[0,1,2,0,1,2,0,1,2,0,1,2,0,1,2] 
In [346]: z=np.zeros((7,3),int) 
In [347]: z[i,j]=x.ravel()[:len(i)] 
In [348]: z 
Out[348]: 
array([[12, 0, 0], 
     [ 9, 13, 0], 
     [ 6, 10, 14], 
     [ 3, 7, 11], 
     [ 0, 4, 8], 
     [ 0, 1, 5], 
     [ 0, 0, 2]])

In [304]: pd.concat([rectdf.iloc[:, i].shift(-i) for i in range(rectdf.shape[1])], axis=1) 
Out[304]: 
    0 1 2 
0 0 4 8 
1 3 7 11 
2 6 10 14 
3 9 13 NaN 
4 12 NaN NaN

它看起來像sparse通過與np.zeros開始，並以適當的索引填充它創建了一個data陣列

雖然我仍然需要創建任何形狀的i,j的方式。對於j很容易：

j=np.tile(np.arange(3),5) 
j=np.tile(np.arange(x.shape[1]),x.shape[0])

重塑i

In [363]: np.array(i).reshape(-1,3) 
Out[363]: 
array([[4, 5, 6], 
     [3, 4, 5], 
     [2, 3, 4], 
     [1, 2, 3], 
     [0, 1, 2]])

使我與重新創建：

In [371]: ii=(np.arange(3)+np.arange(5)[::-1,None]).ravel() 
In [372]: ii 
Out[372]: array([4, 5, 6, 3, 4, 5, 2, 3, 4, 1, 2, 3, 0, 1, 2])

所以綜合起來：

def all_traces(x): 
    jj = np.tile(np.arange(x.shape[1]),x.shape[0]) 
    ii = (np.arange(x.shape[1])+np.arange(x.shape[0])[::-1,None]).ravel() 
    z = np.zeros(((x.shape[0]+x.shape[1]-1),x.shape[1]),int) 
    z[ii,jj] = x.ravel() 
    return z.sum(axis=1)

它需要更多的測試各種形狀。

此函數比迭代過跡線更快，即使有這樣的小尺寸陣列：

In [387]: timeit all_traces(x) 
10000 loops, best of 3: 70.5 µs per loop 
In [388]: timeit [np.trace(x,i) for i in range(-(x.shape[0]-1),x.shape[1])] 
10000 loops, best of 3: 106 µs per loop

來源

2016-01-28 19:31:29 hpaulj

對於2D numpy的陣列A這可能是最短的代碼進行總結對角線（？）：

np.bincount(sum(np.indices(A.shape)).flat, A.flat)

要總結相反的對角線，您可以使用np.fliplr這個數組。

來源

2016-01-28 23:33:12

直接生成Numpy/Pandas中所有平行對角線總和的方法？

回答

簡短的回答

發展

相關問題