2017-02-12 268 views
6

我有一個帶有汽車數據的熊貓數據框。我想爲每個製造商找到最暢銷的兩款車型,然後對製造商進行排名,降序排列。排序熊貓數據幀

Maker Model No Sold(,000s) 
Ford Kuga 35 
Ford Focus 47 
Ford Ka   31 
Ford Fiesta  68 
Ford Mondeo  55 
Ford S-Max  34 
Ford Galaxy  23 
Nissan Leaf  28 
Nissan Micra  31 
Nissan Note   43 
Nissan Pulsar  23 
Nissan Juke   57 
Nissan Qashqai  62 
Nissan X-Trail   38 
Honda Jazz   24 
Honda Civic   32 
Honda HRV   33 
Honda CRV   29 
Honda Accord   30 
Honda NSX   15 
Toyota Aygo   44 
Toyota Auris   45 
Toyota Avensis   35 
Toyota Prius   32 
Toyota Rav4   29 
Toyota Land Cruiser 14 
Citroen C1   40 
Citroen C3 25 
Citroen C4 46 
Citroen DS3 35  
Citroen DS4 31 
Citroen DS5 25  
Audi A1 23 
Audi A3 47 
Audi A4 30 
Audi A6 20 
Audi A8 18 
BMW 1 Series 36 
BMW 2 Series 20 
BMW 3 Series 53 
BMW 4 Series 21 
BMW 5 Series 27 
BMW 6 Series 24 
BMW 7 Series 16 

對不起,不知道如何把Dataframe放在這裏。

回答

5

使用groupby + nlargest

df.set_index('Model').groupby('Maker')['No Sold(,000s)'].nlargest(2) 

Maker Model 
Audi  A3   47 
     A4   30 
Citroen C4   46 
     C1   40 
Ford  Fiesta  68 
     Mondeo  55 
Honda HRV  33 
     Civic  32 
Nissan Qashqai 62 
     Juke  57 
Toyota Auris  45 
     Aygo  44 
Name: No Sold(,000s), dtype: int64 
3

替代解決方案:

In [222]: df.sort_values(['Maker', 'No Sold(,000s)'], ascending=[1,0]) \ 
      .groupby('Maker', as_index=False).head(2) 
Out[222]: 
     Maker  Model No Sold(,000s) 
33  Audi  A3    47 
34  Audi  A4    30 
39  BMW 3 Series    53 
37  BMW 1 Series    36 
28 Citroen  C4    46 
26 Citroen  C1    40 
3  Ford Fiesta    68 
4  Ford Mondeo    55 
16 Honda  HRV    33 
15 Honda  Civic    32 
12 Nissan Qashqai    62 
11 Nissan  Juke    57 
21 Toyota  Auris    45 
20 Toyota  Aygo    44 

PS請注意:@piRSquared's solution是更地道,應該會更快

1

我相信你也可以這樣做:

df[df.groupby(by=['maker'])["no sold(000's)"].rank() <= 2]