2016-07-07 113 views
2

我有一些數據代表許多不同網站的時間結果。我想找到結果的四分位數,以及每個站點的最大和最小日期。python(熊貓):重組groupby語句

發現每一項都是很容易的:

#quartiles 
q = df.groupby(['site_id', 'datum']).quantile([0.25,0.5,0.75]) 
#max and min vlaues 
d_max = df.groupby(['site_id', 'datum']).max() 
d_min = df.groupby(['site_id', 'datum']).min() 

結果是多指數dataframes。我怎樣才能將它們連接在一起以獲得每個site_id和datum組合的全部3個值?

一些樣本數據:

from io import StringIO 
import pandas as pd 

TESTDATA=StringIO(u'''date site_id datum result 
1968-01-10 RN004481 SWL  61.23 
1977-06-07 RN004481 SWL  60.16 
1979-12-12 RN004481 SWL  58.76 
1971-04-24 RN004482 SWL  79.93 
1971-09-29 RN004482 SWL  79.97 
1995-09-19 RN004482 SWL  92.91 
1996-02-08 RN004482 SWL  93.15 
1964-10-29 RN00448411 SWL  67.87 
1965-03-04 RN004687 SWL  74.90 
1993-03-16 RN02528611 SWL  7.50 
2011-10-24 RN029429 SWL  2.59 
2011-11-05 RN029429 SWL  2.68 
1992-06-24 RN004464 SWL  52.24 
1986-08-11 RN004482 SWL  86.84 
1998-01-29 RN004482 SWL  94.33 
1966-11-24 RN004687 DTW  75.16 
1978-08-30 RN004687 SWL  78.24 
1983-02-22 RN004687 DTW  81.00 
1984-07-24 RN004687 SWL  81.26 
1993-07-07 RN004687 SWL  87.18 
1994-04-08 RN004687 DTW  87.53 
1994-08-11 RN004687 SWL  87.41 
2001-01-10 RN004687 SWL  92.04 
2010-11-15 RN004687 SWL  97.06 
1964-10-01 RN004693 SWL  59.56 
1965-06-03 RN004693 SWL  59.74 
1967-05-19 RN004693 SWL  59.58 
1967-06-23 RN004693 RSWL 59.61 
1967-09-22 RN004693 RSWL 59.69 
1970-12-16 RN004693 DTW  59.54 
''') 

df = pd.read_csv(TESTDATA, delim_whitespace=True) 

回答

2

這是做這件事:

pd.concat([d_max, d_min, q.unstack().result], axis=1, keys=['max', 'min', 'quantiles']) 

enter image description here

+0

不錯,拆散了我缺少的東西。 – jprockbelly