出於性能,我建議使用底層陣列數據和array-slicing
爲兩列被修改進來序列使用視圖進去 -
a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
爲了更詳細地討論陣列切片部分,a[:,[1,2]]
將會有力那裏有一份副本,並會放慢速度。數據幀端的a[:,[1,2]]
相當於df[['open','close']]
,而且我猜測它也在放慢速度。 df.iloc[:,1:3]
因此改善了它。
採樣運行 -
In [64]: df
Out[64]:
prev open close volume
0 20.77 20.87 19.87 962816
1 19.87 19.89 19.56 668076
2 19.56 19.96 20.10 578987
3 20.10 20.40 20.53 418597
In [65]: a = df.values
...: df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
...:
In [66]: df
Out[66]:
prev open close volume
0 20.77 1.004815 0.956668 962816
1 19.87 1.001007 0.984399 668076
2 19.56 1.020450 1.027607 578987
3 20.10 1.014925 1.021393 418597
運行測試
途徑 -
def numpy_app(df): # Proposed in this post
a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
return df
def pandas_app1(df): # @Scott Boston's soln
df[['open','close']] = df[['open','close']].div(df['prev'].values,axis=0)
return df
計時 -
In [44]: data = np.random.randint(15, 25, (100000,4)).astype(float)
...: df1 = pd.DataFrame(data, columns=(('prev','open','close','volume')))
...: df2 = df1.copy()
...:
In [45]: %timeit pandas_app1(df1)
...: %timeit numpy_app(df2)
...:
100 loops, best of 3: 2.68 ms per loop
1000 loops, best of 3: 885 µs per loop
爲什麼不想到這一點....哈哈Ť漢克斯。我知道我做得比它應該更復雜 – user1179317
'df.assign(open = df.open/df.prev,close = df.close/df.prev)'? – Abdou