使用map
由字典,對於相同的格式由str[0]
使用split
並選擇第一列表:
d = dict(zip(pd.date_range('2015-01-01', '2015-01-01 23:59:59', freq='30T')
.strftime('%H:%M:%S'), range(1, 49)))
print (d)
{'00:30:00': 2, '13:30:00': 28, '07:00:00': 15, '19:30:00': 40, '12:00:00': 25,
'10:30:00': 22, '01:30:00': 4, '14:30:00': 30, '21:00:00': 43, '11:00:00': 23,
'16:00:00': 33, '06:30:00': 14, '05:00:00': 11, '03:00:00': 7, '20:00:00': 41,
'06:00:00': 13, '01:00:00': 3, '18:00:00': 37, '15:00:00': 31, '09:00:00': 19,
'19:00:00': 39, '02:30:00': 6, '23:00:00': 47, '02:00:00': 5, '08:30:00': 18,
'14:00:00': 29, '17:00:00': 35, '13:00:00': 27, '21:30:00': 44, '04:30:00': 10,
'07:30:00': 16, '18:30:00': 38, '16:30:00': 34, '23:30:00': 48, '00:00:00': 1,
'17:30:00': 36, '05:30:00': 12, '10:00:00': 21, '11:30:00': 24, '15:30:00': 32,
'22:00:00': 45, '20:30:00': 42, '04:00:00': 9, '09:30:00': 20, '03:30:00': 8,
'08:00:00': 17, '12:30:00': 26, '22:30:00': 46}
df['new']=df['INTV'].str.split('.').str[0].map(d)
print (df)
DATE INTV Y new
0 2005-11-10 00:00:00.000 0 1
1 2005-11-10 00:30:00.000 0 2
2 2005-11-10 01:00:00.000 0 3
3 2005-11-10 01:30:00.000 1 4
4 2005-11-10 02:00:00.000 1 5
5 2005-11-10 02:30:00.000 0 6
6 2005-11-10 22:00:00.000 1 45
7 2005-11-10 22:30:00.000 3 46
8 2005-11-10 23:00:00.000 3 47
9 2005-11-10 23:30:00.000 0 48
詳情:
print (df['INTV'].str.split('.').str[0])
0 00:00:00
1 00:30:00
2 01:00:00
3 01:30:00
4 02:00:00
5 02:30:00
6 22:00:00
7 22:30:00
8 23:00:00
9 23:30:00
Name: INTV, dtype: object
另一個,改進josh溶液:
dates = pd.to_datetime(df['INTV'])
df['new']= dates.dt.hour * 2 + dates.dt.minute//30 + 1
print (df)
DATE INTV Y new
0 2005-11-10 00:00:00.000 0 1
1 2005-11-10 00:30:00.000 0 2
2 2005-11-10 01:00:00.000 0 3
3 2005-11-10 01:30:00.000 1 4
4 2005-11-10 02:00:00.000 1 5
5 2005-11-10 02:30:00.000 0 6
6 2005-11-10 22:00:00.000 1 45
7 2005-11-10 22:30:00.000 3 46
8 2005-11-10 23:00:00.000 3 47
9 2005-11-10 23:30:00.000 0 48
細節 - dat e是並不重要,如果只分析時間是今天加入:
print (dates)
0 2017-10-17 00:00:00
1 2017-10-17 00:30:00
2 2017-10-17 01:00:00
3 2017-10-17 01:30:00
4 2017-10-17 02:00:00
5 2017-10-17 02:30:00
6 2017-10-17 22:00:00
7 2017-10-17 22:30:00
8 2017-10-17 23:00:00
9 2017-10-17 23:30:00
Name: INTV, dtype: datetime64[ns]
需要用'DF [ 'INTV'] = pd.factorize(DF [ 'INTV'])[1] + 1'我覺得 – jezrael
是什麼列INTV的dtype? – WNG
請務必檢查'INTV'的數據類型並將其與字典中的匹配。 – zipa