正則表達式是非常強大的工具,但在這種情況下,有一個更好的方法:
In [180]: df
Out[180]:
ID ActualDate
0 738564 01/21/2016
1 274628 02/12/2016
2 571749 03/30/2016
3 718563 10/01/2016
4 984739 11/30/2016
5 938511 12/24/2016
6 103216 07/16/2014
7 446754 08/06/2015
8 135654 02/01/2017
9 135614 01/16/2017
10 133346 01/16/2011
11 234682 NaN
12 238756 (none)
我們轉換它datetime
D型:
In [181]: df['ActualDate'] = pd.to_datetime(df['ActualDate'], errors='coerce')
In [182]: df
Out[182]:
ID ActualDate
0 738564 2016-01-21
1 274628 2016-02-12
2 571749 2016-03-30
3 718563 2016-10-01
4 984739 2016-11-30
5 938511 2016-12-24
6 103216 2014-07-16
7 446754 2015-08-06
8 135654 2017-02-01
9 135614 2017-01-16
10 133346 2011-01-16
11 234682 NaT
12 238756 NaT
過濾使用boolean indexing:
In [184]: df[(df['ActualDate'] < '2016-11-01') | df['ActualDate'].isnull()]
Out[184]:
ID ActualDate
0 738564 2016-01-21
1 274628 2016-02-12
2 571749 2016-03-30
3 718563 2016-10-01
6 103216 2014-07-16
7 446754 2015-08-06
10 133346 2011-01-16
11 234682 NaT
12 238756 NaT
過濾使用.query()方法:
In [186]: df.query("ActualDate < '2016-11-01' or ActualDate != ActualDate")
Out[186]:
ID ActualDate
0 738564 2016-01-21
1 274628 2016-02-12
2 571749 2016-03-30
3 718563 2016-10-01
6 103216 2014-07-16
7 446754 2015-08-06
10 133346 2011-01-16
11 234682 NaT
12 238756 NaT
UPDATE:如果你想在字符串D型,以保留原始Date
:
In [190]: df
Out[190]:
ID Actual Date
0 738564 01/21/2016
1 274628 02/12/2016
2 571749 03/30/2016
3 718563 10/01/2016
4 984739 11/30/2016
5 938511 12/24/2016
6 103216 07/16/2014
7 446754 08/06/2015
8 135654 02/01/2017
9 135614 01/16/2017
10 133346 01/16/2011
11 234682 NaN
12 238756 (none)
首先添加一個新的datetime
列:
In [191]: df['Date'] = pd.to_datetime(df['Actual Date'], errors='coerce')
In [192]: df
Out[192]:
ID Actual Date Date
0 738564 01/21/2016 2016-01-21
1 274628 02/12/2016 2016-02-12
2 571749 03/30/2016 2016-03-30
3 718563 10/01/2016 2016-10-01
4 984739 11/30/2016 2016-11-30
5 938511 12/24/2016 2016-12-24
6 103216 07/16/2014 2014-07-16
7 446754 08/06/2015 2015-08-06
8 135654 02/01/2017 2017-02-01
9 135614 01/16/2017 2017-01-16
10 133346 01/16/2011 2011-01-16
11 234682 NaN NaT
12 238756 (none) NaT
過濾:
In [194]: df.drop('Date', 1).loc[(df['Date'] < '2016-11-01') | df['Date'].isnull()]
Out[194]:
ID Actual Date
0 738564 01/21/2016
1 274628 02/12/2016
2 571749 03/30/2016
3 718563 10/01/2016
6 103216 07/16/2014
7 446754 08/06/2015
10 133346 01/16/2011
11 234682 NaN
12 238756 (none)
In [196]: df.query("Date < '2016-11-01' or Date != Date").drop('Date', 1)
Out[196]:
ID Actual Date
0 738564 01/21/2016
1 274628 02/12/2016
2 571749 03/30/2016
3 718563 10/01/2016
6 103216 07/16/2014
7 446754 08/06/2015
10 133346 01/16/2011
11 234682 NaN
12 238756 (none)
好的,我認爲最好的想法是將日期轉換爲日期時間,但是一些命運的值是None或NA,我也需要顯示這個值。任何選項做到這一點,因爲我認爲datetime不接受字符串。 代碼是這樣的 –
我對原始文章 –
進行了更新@CarlosArronteBello,你想在結果數據集中(過濾之後)有那些行'Date'是'None'或'NaN'嗎? – MaxU