2016-07-28 113 views
-1
創建數據幀

我有數據熊貓:使用value_counts

age 
32 
16 
39 
39 
23 
36 
29 
26 
43 
34 
35 
50 
29 
29 
31 
42 
53 

我需要得到水木清華這樣 image 我能得到

df.age.value_counts()

100. * df.age.value_counts()/len(df.age) 

但我怎麼能聯合這個並給名稱列?

回答

1

您可以使用cutagg

#helper df with min and max ages, necessary add category Total 
df1 = pd.DataFrame({'G':['14 yo and younger','15-19','20-24','25-29','30-34', 
         '35-39','40-44','45-49','50-54','55-59','60-64','65+','Total'], 
        'Min':[0, 15,20,25,30,35,40,45,50,55,60,65,np.nan], 
        'Max':[14,19,24,29,34,39,44,49,54,59,64,120, np.nan]}) 

print (df1) 
        G Max Min 
0 14 yo and younger 14.0 0.0 
1    15-19 19.0 15.0 
2    20-24 24.0 20.0 
3    25-29 29.0 25.0 
4    30-34 34.0 30.0 
5    35-39 39.0 35.0 
6    40-44 44.0 40.0 
7    45-49 49.0 45.0 
8    50-54 54.0 50.0 
9    55-59 59.0 55.0 
10    60-64 64.0 60.0 
11    65+ 120.0 65.0 
12    Total NaN NaN 
cutoff = np.hstack([np.array(df1.Min[0]), df1.Max.values]) 
labels = df1.G.values 

df['Groups'] = pd.cut(df.age, bins=cutoff, labels=labels, right=True, include_lowest=True) 
print (df) 
    age Groups 
0 32 30-34 
1 16 15-19 
2 39 35-39 
3 39 35-39 
4 23 20-24 
5 36 35-39 
6 29 25-29 
7 26 25-29 
8 43 40-44 
9 34 30-34 
10 35 35-39 
11 50 50-54 
12 29 25-29 
13 29 25-29 
14 31 30-34 
15 42 40-44 
16 53 50-54 
df = df.groupby('Groups')['Groups'] 
     .agg({'Total':[len, lambda x: len(x)/df.shape[0] * 100 ]}) 
     .rename(columns={'len':'N', '<lambda>':'%'}) 

#last Total row 
df.ix['Total'] = df.sum() 

print (df)  
       Total    
         N   % 
Groups        
14 yo and younger 0.0 0.000000 
15-19    1.0 5.882353 
20-24    1.0 5.882353 
25-29    4.0 23.529412 
30-34    3.0 17.647059 
35-39    4.0 23.529412 
40-44    2.0 11.764706 
45-49    0.0 0.000000 
50-54    2.0 11.764706 
55-59    0.0 0.000000 
60-64    0.0 0.000000 
65+     0.0 0.000000 
Total    17.0 100.000000 

EDIT1:

解決方案與size變得更好:

df1 = df.groupby('Groups').size().to_frame() 
df1.columns = pd.MultiIndex.from_arrays(('Total','N')) 
df1.ix[:,('Total','%')] = 100 * df1.ix[:,('Total','N')]/df.shape[0] 
df1.ix['Total'] = df1.sum() 
print (df1) 
        Total    
         N   % 
Groups        
14 yo and younger 0.0 0.000000 
15-19    1.0 5.882353 
20-24    1.0 5.882353 
25-29    4.0 23.529412 
30-34    3.0 17.647059 
35-39    4.0 23.529412 
40-44    2.0 11.764706 
45-49    0.0 0.000000 
50-54    2.0 11.764706 
55-59    0.0 0.000000 
60-64    0.0 0.000000 
65+     0.0 0.000000 
Total    17.0 100.000000 
+0

它是'%'?我應該得到100%。而且我怎麼能在最低和總數上添加? – ldevyataykina

+0

對不起,這是更復雜的,但現在的答案是編輯。 – jezrael

+0

爲什麼我在使用age = df.groupby('Groups')['Groups']時出現'TypeError:can not append a non-category item to a CategoricalIndex'. agg({'Total':[len, lambda x:len(x)/ df.shape [0] * 100]})。rename(columns = {'len':'N','':'%'}) 'and next'age.ix ['Total'] = age.sum()'? – ldevyataykina