2017-04-13 44 views
1

這是我的數據框找出箱中的一組百分比利用大熊貓

city trips_in_first_30_days bins 
0 King's Landing 4 (3, 125] 
1 Astapor 0 NaN 
2 Astapor 3 (2, 3] 
3 King's Landing 9 (3, 125] 
4 Winterfell 14 (3, 125] 
5 Winterfell 2 (1, 2] 
6 Astapor 1 (0, 1] 
7 Winterfell 2 (1, 2] 
8 Winterfell 2 (1, 2] 
9 Winterfell 1 (0, 1] 
10 Winterfell 1 (0, 1] 
11 Winterfell 3 (2, 3] 
12 Winterfell 1 (0, 1] 
13 King's Landing 0 NaN 
14 Astapor 1 (0, 1] 
15 Winterfell 1 (0, 1] 
16 King's Landing 1 (0, 1] 
17 King's Landing 0 NaN 
18 King's Landing 6 (3, 125] 
19 King's Landing 0 NaN 
20 Winterfell 1 (0, 1] 
21 Astapor 1 (0, 1] 
22 Winterfell 0 NaN 
23 King's Landing 0 NaN 
24 Astapor 4 (3, 125] 
25 Winterfell 1 (0, 1] 
26 Astapor 1 (0, 1] 
27 Winterfell 3 (2, 3] 
28 Winterfell 0 NaN 
29 Astapor 1 (0, 1] 
... ... ... ... 
49970 Winterfell 2 (1, 2] 
49971 King's Landing 0 NaN 
49972 Winterfell 1 (0, 1] 
49973 Astapor 2 (1, 2] 
49974 Winterfell 1 (0, 1] 
49975 Winterfell 11 (3, 125] 
49976 King's Landing 0 NaN 
49977 Astapor 4 (3, 125] 
49978 Winterfell 1 (0, 1] 
49979 Winterfell 0 NaN 
49980 Astapor 1 (0, 1] 
49981 Astapor 0 NaN 
49982 King's Landing 0 NaN 
49983 Winterfell 1 (0, 1] 
49984 Winterfell 1 (0, 1] 
49985 Astapor 1 (0, 1] 
49986 Winterfell 0 NaN 
49987 Winterfell 3 (2, 3] 
49988 King's Landing 1 (0, 1] 
49989 Winterfell 1 (0, 1] 
49990 Astapor 1 (0, 1] 
49991 Winterfell 0 NaN 
49992 King's Landing 1 (0, 1] 
49993 Astapor 3 (2, 3] 
49994 Astapor 1 (0, 1] 
49995 King's Landing 0 NaN 
49996 Astapor 1 (0, 1] 
49997 Winterfell 0 NaN 
49998 Astapor 2 (1, 2] 
49999 Astapor 0 NaN 

df['bins']小sanpshot是絕對的,我已經用pd.cuttrips_in_first_30_days在不同介紹。

現在我有興趣瞭解何時按城市分組trips_in_first_30_days多少百分比下降到各個分檔?

例如,對於城市astapor百分之多少trips_in_first_30_days下降(0,1];?有多少在(1,2]等

是否有可能做到這一點,就好象D型類別,不能有操作進行,以及如何做到這一點

編輯:??

在嘗試建議的解決方案:

def calc_bin_percentage(group_df): 
bins_count = group_df.groupby("bins")["trips_in_first_30_days"].count() 
return 100 * bins_count/len(group_df) 
new_df.groupby("city").apply(calc_bin_percentage) 

的出認沽如下:

bins (0, 1] (1, 2] (2, 3] (3, 125] 
city     
Astapor 31.105601 14.787710 6.973509 14.878432 
King's Landing 22.408687 14.471866 7.541955 20.710760 
Winterfell 28.689578 14.959719 8.017655 20.371957 

每個城市的的百分比之和應爲但事實並非如此

+0

你能告訴我們你期望的結果是什麼樣子嗎? – piRSquared

+0

嗨,請現在檢查。 –

回答

1

爲此,請記住groupbyapply中使用的函數可能會返回一個pd.Series對象(在Pandas文檔中稱爲flexible apply)。

試試下面的代碼:

def calc_bin_percentage(group_df): 
    bins_count = group_df.groupby("bins")["trips_in_first_30_days"].sum() 
    return 100 * bins_count/group_df.sum() 

df.groupby("city").apply(calc_bin_percentage).unstack().fillna(0) 

它的工作分兩個步驟 - 首先由城市分割的數據,然後對每一個城市,計算出每個倉的百分比。

結果應該是以城市爲列,以列爲列的表格。

+0

嗨,出於某種原因,在我的數據,當我嘗試這樣,每個組的百分比的總和不等於100. –

+0

可能是因爲整數除法。我猜你正在使用Python 2(都是結果值整數?)。嘗試乘以'100.0'而不是'100'(將強制浮點除法)。 (在Python 3中,float division是默認值) – tmrlvi

+0

使用python 3,結果值是float。 –