的Python：用熊貓GROUPBY，以減少數據幀

的維度在我的數據框，我們稱之爲DF，我有一個看起來像的Python：用熊貓GROUPBY，以減少數據幀

serial gps_dt lat long dist 
1  25Mar x1 y1 Nan 
1  26Mar x2 y2 0.01 
1  27Mar x3 y3 1.25 (assume this is the 5th occurrence < 160) 
2  24Mar x4 y5 Nan 
2  25Mar x5 y5 2.1 
2  26Mar x6 y6 1.01 
2  27Mar x7 y7 175.2 
2  28Mar x8 y8 179.3 (assume this is the 5th occurrence > 160)

，這樣下去的數據。我已經有一個系列，我們把它叫做check，告訴我是否serial[i] == serial[i+1]。我現在想要做的是當它們相等時，在條件hdist < 160下構造一個包含serial, gps_dt_first, gps_dt_last, avg_lat, avg_long的新數據幀，並且在此半徑內至少有5次出現。如果hdist > 160，我想建造另一組當且僅當在未來5個事件是中第一個大於160

160例如，輸出看起來是這樣的：

serial gps_dt_first gps_dt_last avg_lat avg_long 
1  25Mar  27Mar  avg_x avg_y 
2  27Mar  28Mar  avg_x avg_y

我我正在看熊貓的group by文檔。該數據已經在SAS的serial, gps_dt訂單中。我還需要做df.groupby(['serial', 'gps_dt'])嗎？

一旦DF進行分組，如果需要的話，我的代碼的思想是（更多的是僞代碼大綱）：

if check == true and hdist < 160 and 5 or more occurrences (how to count the occurrences): 
    result['serial'] = df.serial (first in serial; how to extract) 
    result['gps_dt_first'] = df.gps_dt (first in gps_dt) 
    result['gps_dt_last'] = df.gps_dt (last in gps_dt) 
    result['avg_lat'] = df.lat.mean() (only for the subset of serial meeting criteria) 
    result['avg_long'] = df.long.mean() (same here) 
else if check == true and hdist > 160 and 5 or more occurrences; 
    do same as above 
else: 
    delete

來源

2016-03-28 dustin

-2

如果您已經閱讀文檔爲groupby，你可以做什麼以下部分解釋：

Iterate over each element you got from groupby;
Perform one or more aggregate operations（包括應用鏈接操作，或根據列不同操作）;

來源

2016-03-28 14:21:49 heltonbiker

的Python：用熊貓GROUPBY，以減少數據幀

回答

相關問題