2013-07-23 25 views
1

我在熊貓有一個數據框。更快的熊貓日期Removel

其中一列是時間戳。我正在使用以下方法從數據中刪除所有周末:

df = df[df['TIMESTAMP'].apply(pd.datetime.weekday)<5] 

代碼需要9秒鐘才能運行。有沒有更快的方法來做到這一點?

在此先感謝。

回答

2

一個更快的選擇是首先將系列轉換爲DatetimeIndex(其中有一個weekday屬性):

df[pd.DatetimeIndex(df['TIMESTAMP']).weekday < 5] 
+0

TOTD ....剛想後:) – Jeff

2

爲了完整性......

In [1]: df = DataFrame(randn(100000,2),columns=list('AB')) 

In [6]: df['time'] = date_range('19700101',periods=100000) 

In [7]: df.tail() 
Out[7]: 
       A   B    time 
99995 0.481596 -0.622861 2243-10-12 00:00:00 
99996 -1.000646 0.415413 2243-10-13 00:00:00 
99997 0.054219 -0.669477 2243-10-14 00:00:00 
99998 -1.246848 0.690656 2243-10-15 00:00:00 
99999 -2.186820 -0.597221 2243-10-16 00:00:00 

In [8]: df.head() 
Out[8]: 
      A   B    time 
0 -0.011530 -0.609354 1970-01-01 00:00:00 
1 0.652302 -0.229030 1970-01-02 00:00:00 
2 -1.703967 0.880957 1970-01-03 00:00:00 
3 2.000682 -1.250603 1970-01-04 00:00:00 
4 0.483412 2.233786 1970-01-05 00:00:00 

In [10]: pd.DatetimeIndex(df.time).weekday 
Out[10]: array([3, 4, 5, ..., 5, 6, 0], dtype=int32) 

In [11]: df[pd.DatetimeIndex(df.time).weekday<5] 
Out[11]: 
<class 'pandas.core.frame.DataFrame'> 
Int64Index: 71428 entries, 0 to 99999 
Data columns (total 3 columns): 
A  71428 non-null values 
B  71428 non-null values 
time 71428 non-null values 
dtypes: datetime64[ns](1), float64(2) 

In [12]: df[pd.DatetimeIndex(df.time).weekday<5].head() 
Out[12]: 
      A   B    time 
0 -0.011530 -0.609354 1970-01-01 00:00:00 
1 0.652302 -0.229030 1970-01-02 00:00:00 
4 0.483412 2.233786 1970-01-05 00:00:00 
5 0.264460 -0.135544 1970-01-06 00:00:00 
6 0.037285 0.592312 1970-01-07 00:00:00 

In [13]: %timeit df[pd.DatetimeIndex(df.time).weekday<5] 
10 loops, best of 3: 41.4 ms per loop