2016-11-29 60 views
1

我有一個大熊貓dataframe如下。需要再添加兩列,分別列出特定品牌的銷售額最高和最低值。Excel等效於使用Python中的熊貓的數組

date  brand price quantity  sales vat 
31-May-13 Reebok  10  23   230  3.5 
31-May-13 Adidas  10  25   250  2.8 
31-May-13 Campus  8  21   168  3.5 
31-May-13 Nike  10  20   200  6.5 
31-May-13 Woods  2  7   14  2.8 
01-Jun-13 Reebok  4  27   108  2.2 
01-Jun-13 Adidas  7  28   196  3.8 
01-Jun-13 Campus  7  41   287  4.2 
01-Jun-13 Nike  2  39   78  7.2 
01-Jun-13 Woods  5  26   130  3.3 
02-Jun-13 Reebok  10  5   50  2.2 
02-Jun-13 Adidas  10  15   150  3.8  
02-Jun-13 Woods  6  30   180  3.3 

這裏我的日期欄不是按順序排列的,並且所有品牌數據都不能用於日期列中的evey日期。其結果應該是什麼樣子,

date  brand price quantity  sales  vat Max Min 
31-May-13 Reebok  10  23   230  3.5 230 50 
31-May-13 Adidas  10  25   250  2.8 250 150 
31-May-13 Campus  8  21   168  3.5 287 168 
31-May-13 Nike  10  20   200  6.5 200 78 
31-May-13 Woods  2  7   14  2.8 180 14 
01-Jun-13 Reebok  4  27   108  2.2 230 50 
01-Jun-13 Adidas  7  28   196  3.8 250 150 
01-Jun-13 Campus  7  41   287  4.2 287 168 
01-Jun-13 Nike  2  39   78  7.2 200 78 
01-Jun-13 Woods  5  26   130  3.3 180 14 
02-Jun-13 Reebok  10  5   50  2.2 230 50 
02-Jun-13 Adidas  10  15   150  3.8 250 150 
02-Jun-13 Woods  6  30   180  3.3 180 14 

回答

4

您可以使用groupby.transform:

df['max'] = df.groupby('brand')['sales'].transform('max') 
df['min'] = df.groupby('brand')['sales'].transform('min') 

df 
Out: 
     date brand price quantity sales vat max min 
0 2013-05-31 Reebok  10  23 230 3.5 230 50 
1 2013-05-31 Adidas  10  25 250 2.8 250 150 
2 2013-05-31 Campus  8  21 168 3.5 287 168 
3 2013-05-31 Nike  10  20 200 6.5 200 78 
4 2013-05-31 Woods  2   7  14 2.8 180 14 
5 2013-06-01 Reebok  4  27 108 2.2 230 50 
6 2013-06-01 Adidas  7  28 196 3.8 250 150 
7 2013-06-01 Campus  7  41 287 4.2 287 168 
8 2013-06-01 Nike  2  39  78 7.2 200 78 
9 2013-06-01 Woods  5  26 130 3.3 180 14 
10 2013-06-02 Reebok  10   5  50 2.2 230 50 
11 2013-06-02 Adidas  10  15 150 3.8 250 150 
12 2013-06-02 Woods  6  30 180 3.3 180 14 
2

您可以使用GROUPBY,然後用原來的數據幀加入:

>>> g = df.groupby('brand')['sales'].agg([np.min, np.max]) 
>>> g 
     amin amax 
brand    
Adidas 150 250 
Campus 168 287 
Nike  78 200 
Reebok 50 230 
Woods  14 180 
>>> df.join(g, on='brand') 
     date brand price quantity sales vat amin amax 
0 31-May-13 Reebok  10  23 230 3.5 50 230 
1 31-May-13 Adidas  10  25 250 2.8 150 250 
2 31-May-13 Campus  8  21 168 3.5 168 287 
3 31-May-13 Nike  10  20 200 6.5 78 200 
4 31-May-13 Woods  2   7  14 2.8 14 180 
5 01-Jun-13 Reebok  4  27 108 2.2 50 230 
6 01-Jun-13 Adidas  7  28 196 3.8 150 250 
7 01-Jun-13 Campus  7  41 287 4.2 168 287 
8 01-Jun-13 Nike  2  39  78 7.2 78 200 
9 01-Jun-13 Woods  5  26 130 3.3 14 180 
10 02-Jun-13 Reebok  10   5  50 2.2 50 230 
11 02-Jun-13 Adidas  10  15 150 3.8 150 250 
12 02-Jun-13 Woods  6  30 180 3.3 14 180