2013-06-12 263 views
1

我正在閱讀有時間(小時和分鐘)和IP地址的文本文件。然後,我想獲得時間差異並每5分鐘做一些活動。 以下代碼不計算時差。計算時間差(分鐘)

示例文本文件:

06:03 65.55.215.62 
06:04 157.56.92.152 
06:04 66.249.74.175 
06:05 173.199.116.171 

代碼:

time_ip = [] 
for line in open('minutes'): 
    time_ip.append(line.split(' '))  

df = pandas.DataFrame(time_ip) 
df['tvalue'] = df[0] 
df['delta'] = (df['tvalue']-df['tvalue']) 
+0

'df ['tvalue'] - df ['tvalue']'== 0,如果'df [0]'是一個數字。 – Elazar

+0

http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas這裏有一些說明 –

+0

@Elazar IT給出了同樣的錯誤... TypeError:不支持的操作數類型爲 - : 'str'和'str' –

回答

0

可以使用datetime模塊

import datetime 
with open('minutes', 'r') as myfile: 
    times = myfile.read().split()[::2] 
dates = [datetime.datetime.strptime(i, '%H:%M') for i in times] 
differences = [j-i for i, j in zip(dates[:-1], dates[1:])] 
print [divmod(i.seconds, 60)[0] for i in differences] 

打印:

[1, 0, 1] 
+0

開放時間('分鐘'): 次= line.split()[:: 2] dates = [datetime.datetime.strptime(i,'%H:%M')for i times] 差異= [ji for i,j in zip(日期[: - 1],日期[1:])]] print [divmod(i.seconds,60)for i in differences] ......... ...這只是打印一個emty陣列@Haidro –

+0

@NilaniAlgiriyage更新 – TerryA

+0

對於簡單的文件這很好,但對於大型數據文件的輸出是如此混亂,如何一行一行地打印? –

0
>>> import datetime 
>>> end = datetime.datetime.now() 
>>> start = datetime.datetime.now() 
>>> diff 
datetime.timedelta(0, 7, 424199) 
>>> diff = start - end 
>>> divmod(diff.days * 86400 + diff.seconds, 60) 
(0, 7) # 0 minutes, 7 seconds 
1

您應該使用read_csv讀取CSV到數據幀:

In [1]: df = pd.read_csv(file_name, sep='\s+', header=None, names=['time', 'ip']) 

In [2]: df 
Out[2]: 
    time    ip 
0 06:03  65.55.215.62 
1 06:04 157.56.92.152 
2 06:04 66.249.74.175 
3 06:05 173.199.116.171 

大熊貓沒有(還)有任何內置的時間對象,並在Python這樣做是不容易的。 ..你可以騰出時間對象的時間列:

In [3]: df['time'] = df['time'].apply(lambda x: datetime.time(*map(int, x.split(':')))) 

In [4]: df 
Out[4]: 
     time    ip 
0 06:03:00  65.55.215.62 
1 06:04:00 157.56.92.152 
2 06:04:00 66.249.74.175 
3 06:05:00 173.199.116.171 

這不僅是因爲你不能這樣做arithmetic on datetime.time objects無論如何,我想你會因爲在這裏沒有年/月/日而陷入困境,一方面,如何處理午夜?

因此,讓我們重新開始,假設你有一個日期時間...

In [5]: df = pd.read_csv(file_name, sep='\s+', header=None, names=['time', 'ip']) 

In [6]: df['time'] = pd.to_datetime(df['time']) # let's use todays 

In [7]: df 
Out[7]: 
       time    ip 
0 2013-06-12 06:03:00  65.55.215.62 
1 2013-06-12 06:04:00 157.56.92.152 
2 2013-06-12 06:04:00 66.249.74.175 
3 2013-06-12 06:05:00 173.199.116.171 

然後你就可以使用shift搶出來的區別:

In [8]: df['time'].shift() 
Out[8]: 
0     NaT 
1 2013-06-12 06:03:00 
2 2013-06-12 06:04:00 
3 2013-06-12 06:04:00 
Name: time, dtype: datetime64[ns] 

In [9]: d['time'] - df['time'].shift() 
Out[9]: 
0  NaT 
1 00:01:00 
2 00:00:00 
3 00:01:00 
Name: time, dtype: timedelta64[ns] 

容易得多。 :)

+0

爲什麼這個錯誤?AttributeError:'模塊'對象沒有屬性'to_datetime'@Andy Hayden –

+0

@NilaniAlgiriyage您使用哪種版本的熊貓?您需要升級到最新的穩定版本。 :) –

+0

df ['time']。shift()與IP產生相同的輸出? @Andy Hayden –