熊貓：函數之間的時間差和適用於系列

我想弄清楚爲什麼這兩種方法在%timeit結果中有所不同。熊貓：函數之間的時間差和適用於系列

import pandas as pd 
import numpy as np 
d = pd.DataFrame(data={'S1' : [2,3,4,5,6,7,2], 'S2' : [4,5,2,3,4,6,8]}, \ 
       index=[1,2,3,4,5,6,7]) 

%timeit pd.rolling_mean(d, window=3, center=True) 
10000 loops, best of 3: 182 µs per loop 

%timeit d.apply(lambda x: pd.rolling_mean(x, window=3, center=True)) 
1000 loops, best of 3: 695 µs per loop

爲什麼應用（lambda）方法〜3.5 x慢。在更復雜的數據框中，我注意到了更大的差異（〜10 x）。

lambda方法是否在此操作中創建數據的副本？

來源

2014-04-16 sanguineturtle

看起來像極本例中的性能差異可以用raw=True選項被淘汰：

%timeit pd.rolling_mean(d, window=3, center=True) 
1000 loops, best of 3: 281 µs per loop 

%timeit d.apply(lambda x: pd.rolling_mean(x, window=3, center=True)) 
1000 loops, best of 3: 1.02 ms per loop

現在添加Raw=True選項：

%timeit d.apply(lambda x: pd.rolling_mean(x, window=3, center=True),raw=True) 
1000 loops, best of 3: 289 µs per loop

添加reduce=False讓你一個小速度 - 因爲熊貓不必猜測回報：

%timeit d.apply(lambda x: pd.rolling_mean(x, window=3,center=True),raw=True,reduce=False) 
1000 loops, best of 3: 285 µs per loop

因此，在這種情況下，看起來大多數性能差異是相關的，即將每列轉換爲Series，並將每個系列單獨傳遞給rolling_mean。它使用Raw=True它只是通過ndarrays。

來源

2014-04-16 05:48:15

感謝 - http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.DataFrame.apply.html – sanguineturtle

熊貓：函數之間的時間差和適用於系列

回答

相關問題