使用Python，如何按小時對Dataframe中的列進行分組？

我有一個python數據框（df1），它有一個列時間。我使用pd.to_datetime(df1['time'])將列轉換爲日期時間序列。現在，我得到一列是這樣的：使用Python，如何按小時對Dataframe中的列進行分組？

2016-08-24 00:00:00 2016-08-13 00:00:00 
2016-08-24 00:00:00 2016-08-13 00:00:00  
2016-08-24 00:00:00 2016-08-13 00:00:00 
2016-08-24 00:00:00 2016-08-13 00:00:00 
2016-08-24 00:00:01 2016-08-13 00:00:01 
2016-08-24 00:00:01 2016-08-13 00:00:01 
2016-08-24 00:00:02 2016-08-13 00:00:02 
2016-08-24 00:00:02 2016-08-13 00:00:02  
2016-08-24 00:00:02 2016-08-13 00:00:02  
2016-08-24 00:00:02 2016-08-13 00:00:02  
2016-08-24 00:00:02 2016-08-13 00:00:02  
2016-08-24 00:00:02 2016-08-13 00:00:02  
2016-08-24 00:00:02 2016-08-13 00:00:02  
2016-08-24 00:00:02 2016-08-13 00:00:02  
2016-08-24 00:00:02 2016-08-13 00:00:02  
.... 

2016-08-24 23:59:59 2016-08-13 00:00:02

從本質上講，我想第一列按小時分組，這樣我可以看到有多少項目在1個小時在那裏。任何幫助都會很棒。

來源

2016-08-24 Vijay

使用@jezrael設置。

df.resample(rule='H', how='count').rename(columns = {'time':'count'}) 

         count 
2016-08-24 00:00:00  1 
2016-08-24 01:00:00  3 
2016-08-24 02:00:00  1

來源

2016-08-25 05:17:50 Merlin

是的這適用於如果我使用groupby爲單個列。你知道當我們使用多列分組時會發生什麼嗎？ – Vijay

@Vijay ..謝謝，問另一個問題...祝你好運 – Merlin

您可以按如下方式使用pandas.DatetimeIndex。

import numpy as np 
import pandas as pd 

# An example of time period 
drange = pd.date_range('2016-08-01 00:00:00', '2016-09-01 00:00:00', 
         freq='10min') 

N = len(drange) 

# The number of columns without 'time' is three. 
df = pd.DataFrame(np.random.rand(N, 3)) 
df['time'] = drange 

time_col = pd.DatetimeIndex(df['time']) 

gb = df.groupby([time_col.year, 
       time_col.month, 
       time_col.day, 
       time_col.hour]) 

for col_name, gr in gb: 
    print(gr) # If you want to see only the length, use print(len(gr))

[參考文獻] Python Pandas: Group datetime column into hour and minute aggregations

來源

2016-08-25 01:15:12

嗨@Daewon李....謝謝你的答案。當我使用這段代碼時，它會拋出一個錯誤，指出Series對象沒有數值小時。有什麼想法嗎？ – Vijay

@Vijay你使用哪個版本的Python？上述代碼已經在Windows 10 64bit的Anaconda Python 3.5（64位）中進行測試。（你使用哪種版本的熊貓？我的是0.18.1） –

使用resample：

#pandas version 0.18.0 and higher 
df = df.resample('H').size() 

#pandas version below 0.18.0 
#df = df.resample('H', 'size') 

print (df) 
2016-08-24 00:00:00 1 
2016-08-24 01:00:00 3 
2016-08-24 02:00:00 1 
Freq: H, dtype: int64

如果需要輸出DataFrame：

df = df.resample('H').size().rename('count').to_frame() 
print (df) 
        count 
2016-08-24 00:00:00  1 
2016-08-24 01:00:00  3 
2016-08-24 02:00:00  1

或者你可以從DatetimeIndexminutes和seconds通過轉換爲<M8[h]，然後聚集size刪除：

import pandas as pd 

df = pd.DataFrame({'time': {pd.Timestamp('2016-08-24 01:00:00'): pd.Timestamp('2016-08-13 00:00:00'), pd.Timestamp('2016-08-24 01:00:01'): pd.Timestamp('2016-08-13 00:00:01'), pd.Timestamp('2016-08-24 01:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 02:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 00:00:00'): pd.Timestamp('2016-08-13 00:00:00')}}) 
print (df) 
            time 
2016-08-24 00:00:00 2016-08-13 00:00:00 
2016-08-24 01:00:00 2016-08-13 00:00:00 
2016-08-24 01:00:01 2016-08-13 00:00:01 
2016-08-24 01:00:02 2016-08-13 00:00:02 
2016-08-24 02:00:02 2016-08-13 00:00:02 

df= df.groupby([df.index.values.astype('<M8[h]')]).size() 
print (df) 
2016-08-24 00:00:00 1 
2016-08-24 01:00:00 3 
2016-08-24 02:00:00 1 
dtype: int64

來源

2016-08-25 05:01:59 jezrael

我的問題是我有多個列，我正在分組。我的代碼當前是 df2 = df1 ['count'] .groupby（[df1 ['sc-status]，df1 [cs-method]，df1 [time]）。count（）使用上面的代碼，使用我目前的數據，我可以像在我的輸入文件中那樣獲得時間（隨時隨地請求數據）。我正在努力進入下一步，即每隔一小時對這個分組對象（df2）進行分組。希望這是有道理的 – Vijay

使用Python，如何按小時對Dataframe中的列進行分組？

回答

相關問題