2013-03-27 18 views
1

我有時間戳和位置數據的一些數據,如如下:如何可視化連續時間花費在熊貓的位置?

A 2013-02-05 19:45:00 (39.94, -86.159) 
A 2013-02-05 19:55:00 (39.94, -86.159) 
A 2013-02-05 20:00:00 (39.777, -85.995) 
A 2013-02-05 20:05:00 (39.775, -85.978) 
B 2013-02-05 22:20:00 (39.935, -86.159) 
B 2013-02-05 22:25:00 (39.935, -86.159) 
B 2013-02-05 23:55:00 (39.951, -86.151) 
B 2013-02-06 00:00:00 (39.951, -86.151) 
B 2013-02-06 00:05:00 (39.906, -86.196) 
C 2013-02-06 00:25:00 (39.82, -86.249) 
C 2013-02-06 00:30:00 (39.82, -86.249) 
C 2013-02-06 02:45:00 (41.498, -81.527) 
C 2013-02-06 02:55:00 (41.498, -81.527) 
C 2013-02-06 04:35:00 (39.82, -86.249) 
C 2013-02-06 04:40:00 (39.82, -86.249) 

我需要做的是,爲每個用戶每天獲得次的人的數量的直方圖是在一個位置連續。因此,我想標記每個連續的時間段,每個用戶每天的位置保持不變。

我將如何去蟒蛇熊貓?

如用戶C所示,位置可能會在一天內爲用戶重複的情況是可能的,位置(39.82。-86.249)再次出現。所以,這些情況將被視爲單獨的連續時間。

回答

1

我認爲你正在尋找pd.Series.shift

x = pd.Series([1, 3, 3, 2, 3, 3]) 

x 
0 1 
1 3 
2 3 
3 2 
4 3 
5 3 

x.shift(-1) 
0  3 
1  3 
2  2 
3  3 
4  3 
5 NaN 

(x != x.shift(-1)).sum() 
4 

假設問題的數據是

df[['COL1', 'COL2', 'COL3']] 

的輸出中。然後,這應該給你的號碼每個用戶/每天獨特的地方。我不確定這是否正是你想要的,但應該幫助開始

df['DATE'] = df.COL2.apply(lambda s: pd.to_datetime(s).date()) 
df.groupby(['COL1', 'DATE']).apply(lambda sdf: (sdf.COL3 != sdf.COL3).sum()) 
0

你的意思是這樣的嗎?

In [5]: df 
Out[5]: 
    0     1  2  3 
0 A 2013-02-05 19:45:00 39.940 -86.159 
1 A 2013-02-05 19:55:00 39.940 -86.159 
2 A 2013-02-05 20:00:00 39.777 -85.995 
3 A 2013-02-05 20:05:00 39.775 -85.978 
4 B 2013-02-05 22:20:00 39.935 -86.159 
5 B 2013-02-05 22:25:00 39.935 -86.159 
6 B 2013-02-05 23:55:00 39.951 -86.151 
7 B 2013-02-06 00:00:00 39.951 -86.151 
8 B 2013-02-06 00:05:00 39.906 -86.196 
9 C 2013-02-06 00:25:00 39.820 -86.249 
10 C 2013-02-06 00:30:00 39.820 -86.249 
11 C 2013-02-06 02:45:00 41.498 -81.527 
12 C 2013-02-06 02:55:00 41.498 -81.527 
13 C 2013-02-06 04:35:00 39.820 -86.249 
14 C 2013-02-06 04:40:00 39.820 -86.249 

In [6]: def gb(df, *args, **kwargs): 
    ...:  for k, g in df.groupby(*args, **kwargs): 
    ...:   splt = np.split(g, np.where(np.diff(g.index.values)!=1)[0]+1) 
    ...:   for subg in splt: 
    ...:     if len(subg) >=2: yield k, subg 
    ...:    

In [7]: group_args = [0, df[1].apply(lambda x:x.date()), 2 , 3] 

In [8]: for key, grp in gb(df, group_args, sort=False): 
    ...:  print key 
    ...:  print grp 
    ...:  print '-'*10 
    ...: 

打印:

('A', datetime.date(2013, 2, 5), 39.94, -86.159) 
    0     1  2  3 
0 A 2013-02-05 19:45:00 39.94 -86.159 
1 A 2013-02-05 19:55:00 39.94 -86.159 
---------- 
('B', datetime.date(2013, 2, 5), 39.935, -86.159) 
    0     1  2  3 
4 B 2013-02-05 22:20:00 39.935 -86.159 
5 B 2013-02-05 22:25:00 39.935 -86.159 
---------- 
('C', datetime.date(2013, 2, 6), 39.82, -86.249) 
    0     1  2  3 
9 C 2013-02-06 00:25:00 39.82 -86.249 
10 C 2013-02-06 00:30:00 39.82 -86.249 
---------- 
('C', datetime.date(2013, 2, 6), 39.82, -86.249) 
    0     1  2  3 
13 C 2013-02-06 04:35:00 39.82 -86.249 
14 C 2013-02-06 04:40:00 39.82 -86.249 
---------- 
('C', datetime.date(2013, 2, 6), 41.498, -81.527) 
    0     1  2  3 
11 C 2013-02-06 02:45:00 41.498 -81.527 
12 C 2013-02-06 02:55:00 41.498 -81.527