2016-02-05 30 views
1

我正在分析來自不同傳感器的數據。傳感器在使用時變爲活動狀態(1)。但是,我只需要第一次和最後一次激活的時間(和日期),但不需要從中間開始。找到時,我需要創建一個新的DataFrame,其中第一個和最後一個匹配項的時間和日期以及'User'和'Activity'。如何獲取熊貓物品的首次和最後一次出現

我試着遍歷每一行並構建一系列if-then語句,但沒有運氣。 我想知道是否有一個熊貓函數可以讓我有效地做到這一點? 這是我的數據的一個子集。

我剛剛開始得到熊貓的叮咬,所以任何幫助將不勝感激。

乾杯!

import pandas as pd    
cols=['User', 'Activity', 'Coaster1', 'Coaster2', 'Coaster3', 
      'Coaster4', 'Coaster5', 'Coffee', 'Door', 'Fridge', u'coldWater', 
      'hotWater', 'SensorDate', 'SensorTime', 'RegisteredTime'] 

data=[['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:54', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:54', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:55', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:55', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:56', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 0.0, 0.0, 0, 0, 0.0, 1.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:56', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 1.0, 0.0, 0, 0, 0.0, 0.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:58', '13:09:00'], 
      ['Chris', 'coffee + hot water', 0, 1.0, 0.0, 0, 0, 0.0, 0.0, 0.0, 
      0.0, 0.0, '2015-09-21', '13:05:59', '13:09:00']] 

    df=pd.DataFrame(data,columns=cols) 

所需的輸出將如下所示:

data_out=[['Chris','coffee + hot water','0','0','0','0','0','0','1','0','0','0','2015-09-21','13:05:54','13:05:56','13:09:00'],['Chris','coffee + hot water','0','1','0','0','0','0','0','0','0','0','2015-09-21','13:05:58','13:05:59','13:09:00']] 

cols_out=['User', 
'Activity', 
'Coaster1', 
'Coaster2', 
'Coaster3', 
'Coaster4', 
'Coaster5', 
'Coffee', 
'Door', 
'Fridge', 
u'coldWater', 
'hotWater', 
'SensorDate', 
'SensorTimeFirst', 
'SensorTimeLast', 
'RegisteredTime'] 


df_out=pd.DataFrame(data_out, columns=cols_out) 
+0

樣品的期望輸出是什麼? – jezrael

+0

也許你可以試試'print df [df ['Door'] == 1] .groupby(['User','Activity'])[['Door','SensorDate','SensorTime']]。min )'和 'print df [df ['Door'] == 1] .groupby(['User','Activity'])[['Door','SensorDate','SensorTime']] .max() ' – jezrael

+0

在OP中添加了所需的輸出編輯。謝謝! – Waldo

回答

0

您可以嘗試groupby和他們apply自定義函數f,如:

def f(x): 
    Doormin = x[x['Door'] == 1].min() 
    Doormax = x[x['Door'] == 1].max() 
    Coaster2min = x[x['Coaster2'] == 1].min() 
    Coaster2max = x[x['Coaster2'] == 1].max()  
    Coaster1min = x[x['Coaster1'] == 1].min() 
    Coaster1max = x[x['Coaster1'] == 1].max()  
    Door = pd.Series([Doormin['Door'], Doormin['SensorDate'], Doormin['SensorTime'], Doormax['SensorTime'], Doormin['RegisteredTime']], index=['Door','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime']) 
    Coaster1 = pd.Series([Coaster1min['Coaster1'], Coaster1min['SensorDate'], Coaster1min['SensorTime'], Coaster1max['SensorTime'], Coaster1min['RegisteredTime']], index=['Coaster1','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime']) 
    Coaster2 = pd.Series([Coaster2min['Coaster2'], Coaster2min['SensorDate'], Coaster2min['SensorTime'], Coaster2max['SensorTime'], Coaster2min['RegisteredTime']], index=['Coaster2','SensorDate','SensorTimeFirst','SensorTimeLast','RegisteredTime']) 

    return pd.DataFrame([Door, Coaster2, Coaster1]) 

print df.groupby(['User','Activity']).apply(f) 

          Coaster1 Coaster2 Door RegisteredTime \ 
User Activity               
Chris coffee + hot water 0  NaN  NaN  1  13:09:00 
         1  NaN   1 NaN  13:09:00 
         2  NaN  NaN NaN   NaN 

          SensorDate SensorTimeFirst SensorTimeLast 
User Activity               
Chris coffee + hot water 0 2015-09-21  13:05:54  13:05:56 
         1 2015-09-21  13:05:58  13:05:59 
         2   NaN    NaN   NaN 

也許你可以通過添加0,而不是NaNfillna

df = df.groupby(['User','Activity']).apply(f) 
df[['Coaster1','Coaster2','Door']] = df[['Coaster1','Coaster2','Door']].fillna(0) 
print df 
          Coaster1 Coaster2 Door RegisteredTime \ 
User Activity               
Chris coffee + hot water 0   0   0  1  13:09:00 
         1   0   1  0  13:09:00 
         2   0   0  0   NaN 

          SensorDate SensorTimeFirst SensorTimeLast 
User Activity               
Chris coffee + hot water 0 2015-09-21  13:05:54  13:05:56 
         1 2015-09-21  13:05:58  13:05:59 
         2   NaN    NaN   NaN 
+0

謝謝!這就像一個魅力:)我可以從這裏繼續。非常感謝!!!我非常感謝您花費的時間和精力:) – Waldo

+0

有兩個問題:1)如何將DataFrame粘貼到您的答案中,讓它保持格式?我找不到這樣做的方法,以便使我的問題更加整潔。 2)在第一個例子中,爲什麼第2行是空的(NaN)並且在.fillna(0)之後保持原樣?我不完全明白這一點(儘管我知道如何處理它,只是好奇而已)。 – Waldo

+0

什麼意思是保留格式?形成代碼?還是有問題的最終數據框?也許在每行之前嘗試4個空格。 – jezrael

相關問題