2016-04-13 70 views
1

這是我的數據框。我想從特定產品中找到總時間。如何使用python熊貓來計算總天數,小時數和分鐘數?

product,query,time1,time2 
A,a1,25-06-15 08:42:43.830000000 PM,25-06-15 08:42:43.830000000 PM 
A,a2,03-07-15 11:57:10.557000000 AM,03-07-15 11:57:10.557000000 AM 
A,a3,02-07-15 02:32:33.090000000 PM,02-07-15 02:32:33.090000000 PM 
A,a4,04-07-15 11:51:59.090000000 AM,04-07-15 11:51:59.090000000 AM 
A,a5,27-06-15 07:12:30.250000000 PM,27-06-15 07:47:40.270000000 PM 
B,b1,30-06-15 07:48:22.090000000 PM,30-06-15 07:48:22.090000000 PM 
B,b1,01-07-15 02:59:36.290000000 PM,02-07-15 05:37:40.700000000 PM 
B,b1,29-06-15 01:28:07.250000000 PM,20-07-15 12:57:06.343000000 PM 
B,b1,03-07-15 05:58:52.737000000 PM,03-07-15 06:06:23.977000000 PM 
B,b1,26-06-15 12:56:36.210000000 AM,26-06-15 12:56:36.210000000 AM 
B,b1,22-06-15 08:16:10.743000000 PM,22-06-15 08:16:10.743000000 PM 
B,b1,29-06-15 11:35:36.807000000 AM,29-06-15 11:55:01.690000000 AM 

我需要一個像

Product,qurey_count,total_time_taken 
A,5,total time taken 
B,7,total time taken 
+0

平均時間按產品分組?總時間?查詢呢? –

+0

的基礎上產品..我意味着根據產品 –

+0

根據產品我必須顯示查詢總數和解決此查詢所需的總體時間 –

回答

2

輸出我認爲你可以使用groupbyapply自定義函數f

df[['time1', 'time2']] = df['time1'].str.split('\t').apply(pd.Series) 

#you can first convert columns to datetime 
df['time1'] = pd.to_datetime(df['time1']) 
df['time2'] = pd.to_datetime(df['time2']) 

def f(x): 
    return pd.Series([(x.time2 - x.time1).sum(), 
         len(x)], 
        index=['total_time_taken', 'qurey_count']) 

print df.groupby('product').apply(f) 

       total_time_taken qurey_count 
product          
A   0 days 00:35:10.020000   5 
B  52 days 02:33:59.626000   7 
+0

此代碼正在工作,但total_time_taken列顯示南 –

+0

我編輯的答案,我認爲問題是,D型列'time1'和'time2'不是'datetime' – jezrael

+0

現在的代碼是不工作 –

2
df['time'] = df.time2 - df.time1 
>>> (df.groupby('product') 
     .agg({'query': 'count', 'time': sum}) 
     .rename(columns={'query': 'query_count', 'time': 'total_time_taken'})) 
     query_count  total_time_taken 
product          
A     5 0 days 00:35:10.020000 
B     7 52 days 02:33:59.626000 

重新建立原始數據幀:

from pandas import Timestamp 

df = pd.DataFrame(
    {'product': ['A'] * 6 + ['B'] * 6, 
    'query': ['a1', 'a2', 'a3', 'a4', 'a5'] + ['b1'] * 7, 
    'time1': [ 
     Timestamp('2015-06-25 20:42:43.830000'), 
     Timestamp('2015-03-07 11:57:10.557000'), 
     Timestamp('2015-02-07 14:32:33.090000'), 
     Timestamp('2015-04-07 11:51:59.090000'), 
     Timestamp('2015-06-27 19:12:30.250000'), 
     Timestamp('2015-06-30 19:48:22.090000'), 
     Timestamp('2015-01-07 14:59:36.290000'), 
     Timestamp('2015-06-29 13:28:07.250000'), 
     Timestamp('2015-03-07 17:58:52.737000'), 
     Timestamp('2015-06-26 00:56:36.210000'), 
     Timestamp('2015-06-22 20:16:10.743000'), 
     Timestamp('2015-06-29 11:35:36.807000')], 
    'time2': [ 
     Timestamp('2015-06-25 20:42:43.830000'), 
     Timestamp('2015-03-07 11:57:10.557000'), 
     Timestamp('2015-02-07 14:32:33.090000'), 
     Timestamp('2015-04-07 11:51:59.090000'), 
     Timestamp('2015-06-27 19:47:40.270000'), 
     Timestamp('2015-06-30 19:48:22.090000'), 
     Timestamp('2015-02-07 17:37:40.700000'), 
     Timestamp('2015-07-20 12:57:06.343000'), 
     Timestamp('2015-03-07 18:06:23.977000'), 
     Timestamp('2015-06-26 00:56:36.210000'), 
     Timestamp('2015-06-22 20:16:10.743000'), 
     Timestamp('2015-06-29 11:55:01.690000')]}) 
+0

我有時間(時間1,時間2)兩列 –

+0

關鍵錯誤顯示 –

+0

我忘了加上計算的差異。 – Alexander

相關問題