保留日期時間索引

假設我有以下數據幀（時間序列中，第一塔是DateTimeIndex）保留日期時間索引

      atn file 
datetime        
2012-10-08 14:00:00 23.007462  1 
2012-10-08 14:30:00 27.045666  1 
2012-10-08 15:00:00 31.483825  1 
2012-10-08 15:30:00 37.540651  2 
2012-10-08 16:00:00 43.564573  2 
2012-10-08 16:00:00 48.589852  2 
2012-10-08 16:00:00 55.289452  2

我目標是向具有一定數目的在所述第一外觀提取行最後一欄「文件」，所以獲得與此表：

 datetime    atn 
file        
1  2012-10-08 14:00:00 23.007462 
2  2012-10-08 15:30:00 37.540651

我方法是B組Y「文件」，然後聚集在「第一」：

dt.groupby(by="file").aggregate("first")

但是與此有關的問題是，則索引不被用作該分組的一列。我解決了這個首先通過添加索引的列：

dt2 = dt.reset_index() 
dt2.groupby(by="file").aggregate("first")

但現在的問題是的datetime列不是日期了，但浮動：

  datetime  atn 
file       
1  1.349705e+18 23.007462 
2  1.349710e+18 37.540651

有

將浮點數轉換回日期時間的方法？
或者一種保存groupby/aggregate-operation中日期時間的方法？
或更好的方式來實現這個最終tabel？

的示例數據幀可以被使用如下：

拷貝（到剪貼板）：

2012-10-08 14:00:00, 23.007462,  1 
2012-10-08 14:30:00, 27.045666,  1 
2012-10-08 15:00:00, 31.483825,  1 
2012-10-08 15:30:00, 37.540651,  2 
2012-10-08 16:00:00, 43.564573,  2 
2012-10-08 16:00:00, 48.589852,  2 
2012-10-08 16:00:00, 55.289452,  2

然後：

dt = pandas.read_clipboard(sep=",", parse_dates=True, index_col=0, 
          names=["datetime", "atn", "file"])

來源

2012-11-13 joris

您使用哪種版本的熊貓？在你的進程之後，我正在獲取'dt2'並且適當地保留了日期時間。 –

也許也很重要，我的numpy版本（datetime64相關的東西）：>>> pandas .__ version__ '0.9.0' >>> np .__ version__ '1.6.1' – joris

好的。 'parse_dates'似乎是問題@joris。見下面的答案。 –

我認爲這是熊貓中的一個錯誤 - dtype在groupby之後被更改爲一個浮點數

dt3 = dt2.groupby(by="file").aggregate("first") 
dt3.dtypes

給我：

datetime float64 
atn   float64

要更改D型回datetime64你可以這樣做：

dt3['datetime'] = pd.Series(dt3['datetime'], dtype='datetime64[ns]')

我已經創建了GitHub

來源

2012-11-13 14:02:05

大師看起來不錯：https：//github.com/pydata/pandas/issues/2238#issuecomment-10327256 –

謝謝！如您所指出的那樣，將其更改回datetime64目前是一個很好的解決方案。 – joris

一個新的問題看起來像錯誤，但在這一刻，沒有指定parse_dates=True會給我預期的結果。

我IPython的結果 - 沒有parse_dates=True： -

In [29]: dt2 = pd.read_clipboard(sep=",", index_col=0, 
          names=["datetime", "atn", "file"]) 

In [30]: dt2 
Out[30]: 
          atn file 
datetime        
2012-10-08 14:00:00 23.007462  1 
2012-10-08 14:30:00 27.045666  1 
2012-10-08 15:00:00 31.483825  1 
2012-10-08 15:30:00 37.540651  2 
2012-10-08 16:00:00 43.564573  2 
2012-10-08 16:00:00 48.589852  2 
2012-10-08 16:00:00 55.289452  2 

In [31]: dt2.reset_index().groupby(by="file").aggregate("first") 
Out[31]: 
       datetime  atn 
file         
1  2012-10-08 14:00:00 23.007462 
2  2012-10-08 15:30:00 37.540651 

In [32]:

我IPython的結果，與parse_dates=True： -

In [33]: dt = pd.read_clipboard(sep=",", parse_dates=True, index_col=0, 
          names=["datetime", "atn", "file"]) 
KeyboardInterrupt 

In [33]: dt = pd.read_clipboard(sep=",", parse_dates=True, index_col=0, 
          names=["datetime", "atn", "file"]) 

In [34]: dt.reset_index().groupby(by="file").aggregate("first") 
Out[34]: 
      datetime  atn 
file       
1  1.349705e+18 23.007462 
2  1.349710e+18 37.540651

明確檢查dtypes： -

In [40]: new_dt = dt.reset_index().groupby(by="file").aggregate("first") 

In [41]: new_dt 
Out[41]: 
      datetime  atn 
file       
1  1.349705e+18 23.007462 
2  1.349710e+18 37.540651 

In [42]: new_dt.dtypes 
Out[42]: 
datetime float64 
atn   float64 

In [43]: new_dt2 = dt2.reset_index().groupby(by="file").aggregate("first") 

In [44]: new_dt2.dtypes 
Out[44]: 
datetime  object 
atn   float64

來源

2012-11-13 14:46:32

未指定'parse_dates = True'將導致dtype對象的索引，它將保存字符串。在這種情況下沒有DatatimeIndex！ –

感謝您的回答，但我需要它仍然是我進一步分析的日期時間。 – joris

我相信這是固定並將在0.9.1發佈

來源

2012-11-14 00:11:06

保留日期時間索引

回答

相關問題