2016-07-27 168 views
-1

我有一個數據熊貓:創建百分比彙總表

Third party unique identifier Qsex 
9ea3e3cb6719f3d336d324c446f486bd 1 
d1b69bc4cccf0afef66debf4e3f0643e 2 
f574fc585db0cddef88306ef6f32da59 1 
8bc0a586bf0abec653c29cf4160753f9 1 
7c22b56929378ec2eb3a536b4f4bc4e0 2 
23d8433168c46d57a271a6b979037094 1 
5743b7eec1b018572b6c5b44542a67a5 2 
f176289325aa4a6fa56c0179e9cbd101 1 
c729933ff7db798ae07c59d971f40a70 1 
d12d5fc03f4c03bb85c4b39d29dbfa25 2 
442a4568d77d0f5b8a559e8eb39c03b3 1 
a0a536482e7b23956210d1cace0b5fb7 1 
c1aef06d15347ef2fbb2a8a3af1d4b85 1 
38ff613c441bf35fa4054eac88ae3cda 1 

,我需要得到這樣的result

我用

sex = df['Qsex'].value_counts() 

100. * df.sex.value_counts()/len(df.sex) 

百分比。 但我無法想象

+1

我不明白你爲什麼會想到,這兩條線得到你的輸出。 – DeepSpace

回答

0

下面是使用pandas API的答案。

我已經盡我所能建立了函數調用,以便您可以遵循邏輯。答案靈感來自this post

In [1]: import pandas as pd 

In [3]: data = pd.read_csv('data.csv') 

In [4]: data 
Out[4]: 
     Third party unique identifier Qsex 
0 9ea3e3cb6719f3d336d324c446f486bd  1 
1 d1b69bc4cccf0afef66debf4e3f0643e  2 
2 f574fc585db0cddef88306ef6f32da59  1 
3 8bc0a586bf0abec653c29cf4160753f9  1 
4 7c22b56929378ec2eb3a536b4f4bc4e0  2 
5 23d8433168c46d57a271a6b979037094  1 
6 5743b7eec1b018572b6c5b44542a67a5  2 
7 f176289325aa4a6fa56c0179e9cbd101  1 
8 c729933ff7db798ae07c59d971f40a70  1 
9 d12d5fc03f4c03bb85c4b39d29dbfa25  2 
10 442a4568d77d0f5b8a559e8eb39c03b3  1 
11 a0a536482e7b23956210d1cace0b5fb7  1 
12 c1aef06d15347ef2fbb2a8a3af1d4b85  1 
13 38ff613c441bf35fa4054eac88ae3cda  1 

In [5]: data.groupby('Qsex') 
Out[5]: <pandas.core.groupby.DataFrameGroupBy object at 0x111faff98> 

In [6]: data.groupby('Qsex').count() 
Out[6]: 
     Third party unique identifier 
Qsex 
1        10 
2         4 

In [11]: data.groupby('Qsex').count() 
Out[11]: 
     Third party unique identifier 
Qsex 
1        10 
2         4 

In [14]: counts = data.groupby('Qsex').count() 

In [15]: counts['percentage'] = counts['Third party unique identifier'].apply(la 
    ...: mbda x: x/counts['Third party unique identifier'].sum()) 

In [16]: counts 
Out[16]: 
     Third party unique identifier percentage 
Qsex 
1        10 0.714286 
2         4 0.285714 

In [17]: counts['percentage'] = counts['Third party unique identifier'].apply(la 
    ...: mbda x: 100*x/counts['Third party unique identifier'].sum()) 

In [18]: counts 
Out[18]: 
     Third party unique identifier percentage 
Qsex 
1        10 71.428571 
2         4 28.571429 
+0

如何創建這樣的電子表格?我的意思是列的所有名稱 – ldevyataykina

+0

我認爲這裏可能存在一個誤解。 'pandas'旨在幫助您以編程方式在交互式和腳本環境中處理數據。如果您想「創建電子表格」,最好的辦法是使用'pandas' DataFrame對象提供的'.to_csv('filename.csv')'功能。我建議看看文檔。 – ericmjl

0

試試這個:

df["Sex"] = np.where(df["Qsex"] == 1, "Male", "Female") 
df2  = pd.crosstab(df.Sex, df.Qsex, margins=True) 
df3  = np.round(df2[["All"]]/df['Sex'].count()*100,2).rename(columns ={"All" :'%'}) 
pd.concat([df2[["All"]],df3], axis =1) 



    Qsex All  % 
Sex     
Female 4 28.57 
Male  10 71.43 
All  14 100.00 
+0

如何重命名列,如在圖像上?並添加頂部面板與'Total'和'N' – ldevyataykina

+1

既然你是出口到Excel,在Excel中做... – Merlin

+0

如果答案有效,請考慮接受它,你也可以upvote。 – Merlin