2013-04-10 35 views
3

我設置了兩個datetime列像這樣一個數據幀:篩選與datetime列引發錯誤

range1 = Series(date_range('1/1/2011', periods=50, freq='D')) 
range2 = Series(date_range('2/5/2011', periods=50, freq='D')) 
df1 = DataFrame({'a': rng1, 'b': rng2}, dtype='datetime64[D]') 

奇怪,問DF1的dtypes給我:

In [71]: df1.dtypes 
Out[71]: 
a datetime64[ns] 
b datetime64[ns] 

更糟糕的是,當我試圖篩選,像這樣的數據框:

In [62]: 

d = datetime(2011,1,14) 
df1[df1 > d] 

我得到一個錯誤:

TypeError         Traceback (most recent call last) 
<ipython-input-62-50b4b9735157> in <module>() 
     1 d = datetime(2011,1,14) 
----> 2 df1[df1 > d] 

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in f(self, other) 
    313    return self._combine_series_infer(other, func) 
    314   else: 
--> 315    return self._combine_const(other, func) 
    316 
    317  f.__name__ = name 

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _combine_const(self, other, func) 
    3677    return self 
    3678 
-> 3679   result_values = func(self.values, other) 
    3680 
    3681   if not isinstance(result_values, np.ndarray): 

TypeError: can't compare datetime.datetime to long 

有人能告訴我爲什麼會發生這種情況嗎?我使用numpy 1.7和熊貓0.10.1。

回答

2

datetime64[ns] dtypes支持,儘量W/O型的D型

In [9]: df1 = DataFrame({'a': range1, 'b' : range2}) 

In [10]: df1 
In [15]: df1.head() 
Out[15]: 
        a     b 
0 2011-01-01 00:00:00 2011-02-05 00:00:00 
1 2011-01-02 00:00:00 2011-02-06 00:00:00 
2 2011-01-03 00:00:00 2011-02-07 00:00:00 
3 2011-01-04 00:00:00 2011-02-08 00:00:00 
4 2011-01-05 00:00:00 2011-02-09 00:00:00 

In [16]: df1[df1.a>datetime.datetime(2011,1,14)].head() 
Out[16]: 
        a     b 
14 2011-01-15 00:00:00 2011-02-19 00:00:00 
15 2011-01-16 00:00:00 2011-02-20 00:00:00 
16 2011-01-17 00:00:00 2011-02-21 00:00:00 
17 2011-01-18 00:00:00 2011-02-22 00:00:00 
18 2011-01-19 00:00:00 2011-02-23 00:00:00 

僅供參考,在此之後:https://github.com/pydata/pandas/issues/3311已經合併了, 那麼操作OP表示,一個where返回此:

In [15]: df1[df1>datetime.datetime(2011,1,14)].head(20) 
Out[15]: 
        a     b 
0     NaT 2011-02-05 00:00:00 
1     NaT 2011-02-06 00:00:00 
2     NaT 2011-02-07 00:00:00 
3     NaT 2011-02-08 00:00:00 
4     NaT 2011-02-09 00:00:00 
5     NaT 2011-02-10 00:00:00 
6     NaT 2011-02-11 00:00:00 
7     NaT 2011-02-12 00:00:00 
8     NaT 2011-02-13 00:00:00 
9     NaT 2011-02-14 00:00:00 
10     NaT 2011-02-15 00:00:00 
11     NaT 2011-02-16 00:00:00 
12     NaT 2011-02-17 00:00:00 
13     NaT 2011-02-18 00:00:00 
14 2011-01-15 00:00:00 2011-02-19 00:00:00 
15 2011-01-16 00:00:00 2011-02-20 00:00:00 
16 2011-01-17 00:00:00 2011-02-21 00:00:00 
17 2011-01-18 00:00:00 2011-02-22 00:00:00 
18 2011-01-19 00:00:00 2011-02-23 00:00:00 
19 2011-01-20 00:00:00 2011-02-24 00:00:00 
+0

在0.11.0中,'df1 [df1> d]'返回了什麼?如果'df1> d'是bools的DataFrame,那麼'df1 [df1> d]'是'daten64 [ns]'值的DataFrame,'NaN'中的'df1> d'是False?如果'NaN'不是'datetime64 [ns]'中的值,那怎麼會這樣呢? – unutbu 2013-04-10 14:06:23

+0

''np.nan''被替換爲''NaT''(缺少datetime64 [ns]的值指示符,''我現在看到一個錯誤,df [df> datetime]應該可以工作(但它是不使用正確的過濾器)... – Jeff 2013-04-10 14:14:49

+0

感謝您的快速行動! – 2013-04-11 06:51:45