2016-04-04 78 views
1

我有一個熊貓據幀,df它看起來像這樣:的Unix時間戳轉換使用熊貓問題

 _sent_time_stamp distance duration duration_in_traffic Orig_lat 
0   1456732800  1670  208     343 51.441092 

我想時代的時間值(_sent_time_stamp)轉換成兩列,一個日期和一個與小時。

我定義了兩個功能:

def date_convert(time): 
    return time.date() 

def hour_convert(time): 
    return time.hour() 

然後我用演算應用這些功能,創建2個新列。

df['date'] = Goo_results.apply(lambda row: date_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1) 

df['hour'] = Goo_results.apply(lambda row: hour_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1) 

日期列的工作原理,但小時不起作用。我看不出爲什麼!

TypeError: ("'int' object is not callable", u'occurred at index 0') 
+1

可以只是轉換整個列'DF [ '小時'] = pd.to_datetime(DF [ '_ sent_time_stamp'],單元= 'S')dt.hour'。 – EdChum

回答

1

您可以刪除()下一個hour

def date_convert(time): 
    return time.date() 

def hour_convert(time): 
    return time.hour #remove() 

df['date'] = df.apply(lambda row: date_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1) 
df['hour'] = df.apply(lambda row: hour_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)  
print df 
    _sent_time_stamp distance duration duration_in_traffic Orig_lat \ 
0  1456732800  1670  208     343 51.441092 

     date hour 
0 2016-02-29  8 

但更好更快的是使用dt.datedt.hour

dat = pd.to_datetime(df['_sent_time_stamp'], unit='s') 
df['date'] = dat.dt.date 
df['hour'] = dat.dt.hour 
print df 
    _sent_time_stamp distance duration duration_in_traffic Orig_lat \ 
0  1456732800  1670  208     343 51.441092 

     date hour 
0 2016-02-29  8 

時序

In [20]: %timeit new(df1) 
1000 loops, best of 3: 827 µs per loop 

In [21]: %timeit lamb(df) 
The slowest run took 4.40 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.13 ms per loop 

代碼:

df1 = df.copy() 

def date_convert(time): 
    return time.date() 

def hour_convert(time): 
    return time.hour 


def lamb(df):  
    df['date'] = df.apply(lambda row: date_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1) 
    df['hour'] = df.apply(lambda row: hour_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)  
    return df 

def new(df): 
    dat = pd.to_datetime(df['_sent_time_stamp'], unit='s') 
    df['date'] = dat.dt.date 
    df['hour'] = dat.dt.hour 
    return df 

print lamb(df)  
print new(df1)