我試圖將一堆時間序列數據分組爲2小時。我對此非常陌生,所以請耐心等待。我想我可以根據以前的研究使用熊貓。Python&Pandas:以2h爲增量計算時間數據
我有一個數據集(數值指明MyTime),看起來像這樣:
['15:23', '14:41', '13:54', '07:13', '20:21', '13:15', '14:48', '12:06', '08:37', '06:32', '07:04', '14:20', '16:28',
'06:49', '08:39', '09:15', '08:54', '05:37', '14:43', '06:20', '11:25', '11:05', '09:28', '14:05', '14:24', '15:30',
'13:28', '16:55', '09:29', '17:44', '07:24', '09:37', '06:47', '14:35', '10:55', '22:29', '06:24', '09:25', '06:45',
'23:49', '19:34', '01:31', '14:22', '13:58', '09:08', '05:11', '08:09', '08:52', '02:50', '12:51', '17:33', '07:07',
'08:11', '10:06', '23:48', '22:27', '11:15', '15:09', '16:45', '20:42', '12:12', '07:08', '16:13', '20:40', '17:26',
'18:57', '15:07', '09:19', '09:10', '09:17', '09:26', '14:18', '06:31', '14:13', '14:01', '08:57', '21:34']
我想利用這個數據集,基本上看到像這樣的輸出:
0-2: 4
2-4: 7
4-6: 3
6-8: 3
8-10: 2
10-12: 5
12-14: 14
....etc
這裏是一個子集我的代碼
import csv
from collections import Counter
import pandas as pd
import numpy as np
mycount = Counter()
mytime = []
with open('temp_dates.csv') as csvfile2:
readCSV2 = csv.reader(csvfile2, delimiter=',')
incoming = []
for row in readCSV2:
readin = row[0]
time = row[1]
year, month, day = (int(x) for x in readin.split('-'))
ans = datetime.date(year, month, day)
wkday = ans.strftime("%A")
incoming.append([wkday,time])
mycount[wkday] += 1
mytime.append(time)
with open('new_dates2.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerows(incoming)
csvfile2.close()
for key,value in sorted(mycount.iteritems()):
daylist = key, value
print(daylist)
#print(mytime)
df = pd.DataFrame()
#print(df)
df.groupby([df['mytime'],pd.TimeGrouper(freq='2H')])
我猜我的第一個問題是數據沒有正確格式化爲TimeGrouper聯合國derstand?其次,我可能錯過了一些告訴數據框看什麼的東西?任何幫助,將不勝感激。
通過請求的原始源CSV文件的片段如下(我們只是談論填充到'mytime'的第2列)。
Sunday,14:35
Sunday,10:55
Friday,22:29
Friday,06:24
Thursday,09:25
Wednesday,06:45
這是一個有點混亂。你的第一個陳述是你有一個時間表,但第一個代碼是從csv構建日期。我猜測列表mytime包含數據,只有最後兩行是實際問題? – Ben
請提供樣本可重現的原始格式(CSV)數據集 – MaxU
mytime是我試圖拉數據 - 它是從CSV文件(行[1])填充。上面的數據列表直接從mytime – Justin