2014-01-21 75 views
0

我正在使用時間戳數據集。我必須計算觀察值的時間差(時間戳)。時間戳記爲datetime64[ns]類型。 dfnew是熊貓數據框。以毫微秒返回的datetime64 [ns]對象之間的區別

dfnew['timestamp'] = dfnew['timestamp'].astype('datetime64[ns]') 
    dfnew['dates']=dfnew['timestamp'].map(Timestamp.date) 
    uniqueDates=list(set(dfnew['dates']))#unique values of date in a list 
    #making a numpy array of timestamp for a particular date 
    x = np.array(dfnew[dfnew['dates']==uniqueDates[0]]['timestamp']) 
    y = np.ediff1d(x) #calculating consecutive difference of timestamp 
    print max(y) 
    49573580000000 nanoseconds 
    print min(y) 
    -86391523000000 nanoseconds 

    print y[1:20] 
    [ 92210000000 388030000000   0 211607000000 249337000000 
     19283000000 91407000000 120180000000 240050000000 30406000000 
       0 480337000000  13000000 491424000000   0 
     80980000000 388103000000 88850000000 120333000000] 
    dfnew['timestamp][0:20] 
    0 2013-12-19 09:03:21.223000 
    1 2013-12-19 11:34:23.037000 
    2 2013-12-19 11:34:23.050000 
    3 2013-12-19 11:34:23.067000 
    4 2013-12-19 11:34:23.067000 
    5 2013-12-19 11:34:23.067000 
    6 2013-12-19 11:34:23.067000 
    7 2013-12-19 11:34:23.067000 
    8 2013-12-19 11:34:23.067000 
    9 2013-12-19 11:34:23.080000 
    10 2013-12-19 11:34:23.080000 
    11 2013-12-19 11:34:23.080000 
    12 2013-12-19 11:34:23.080000 
    13 2013-12-19 11:34:23.080000 
    14 2013-12-19 11:34:23.080000 
    15 2013-12-19 11:34:23.097000 
    16 2013-12-19 11:34:23.097000 
    17 2013-12-19 11:34:23.097000 
    18 2013-12-19 11:34:23.097000 
    19 2013-12-19 11:34:23.097000 
    Name: timestamp 

有沒有什麼方法可以讓我得到的輸出在hour而非nanoseconds。我可以使用正常的分區來轉換它,但我正在尋找其他的選擇。 另外,當我將這個文件保存到txt文件中時,「納秒」一詞也存在。我怎麼能刪除這個單位保存到txt文件我只想保存號碼。 任何幫助表示讚賞

+0

numpy的差異不timedelata妥善處理(以及它的工作原理,但它是相當原始的);你應該至少有numpy 1.7 – Jeff

+0

@Jeff是我有什麼必要的。其實數據集是相當大的,這就是爲什麼我堅持numpy數組。他們很快 – sau

+0

請看這裏:http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas-conversions – Jeff

回答

2

嘗試Series.diff()

import pandas as pd 
import io 

txt = """0 2013-12-19 09:03:21.223000 
1 2013-12-19 11:34:23.037000 
2 2013-12-19 11:34:23.050000 
3 2013-12-19 11:34:23.067000 
4 2013-12-19 11:34:23.067000 
5 2013-12-19 11:34:23.067000 
6 2013-12-19 11:34:23.067000 
7 2013-12-19 11:34:23.067000 
8 2013-12-19 11:34:23.067000 
9 2013-12-19 11:34:23.080000 
10 2013-12-19 11:34:23.080000 
11 2013-12-19 11:34:23.080000 
12 2013-12-19 11:34:23.080000 
13 2013-12-19 11:34:23.080000 
14 2013-12-19 11:34:23.080000 
15 2013-12-19 11:34:23.097000 
16 2013-12-19 11:34:23.097000 
17 2013-12-19 11:34:23.097000 
18 2013-12-19 11:34:23.097000 
19 2013-12-19 11:34:23.097000 
""" 

s = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, parse_dates=[[1,2]], header=None, index_col=1, squeeze=True) 

s.diff() 

結果:

0    NaT 
1 02:31:01.814000 
2 00:00:00.013000 
3 00:00:00.017000 
4   00:00:00 
5   00:00:00 
6   00:00:00 
7   00:00:00 
8   00:00:00 
9 00:00:00.013000 
10   00:00:00 
11   00:00:00 
12   00:00:00 
13   00:00:00 
14   00:00:00 
15 00:00:00.017000 
16   00:00:00 
17   00:00:00 
18   00:00:00 
19   00:00:00 
Name: 1_2, dtype: timedelta64[ns] 
相關問題