計算特定羣體的百分位數

我有3列。 Product Id,Price,Group（值A，B，C，D）計算特定羣體的百分位數

我想爲每個組獲得價格百分比，並且運行以下代碼。

for group, price in df.groupby(['group']): 
    df['percentile'] = np.percentile(df['price'],60)

列百分位數對於每個組只有一個值3.44。每組的預期值爲2.12, 3.43, 3.65, 4.76. 8.99。

這裏怎麼回事，請讓我知道。

來源

2016-04-29 Anu

我覺得你可以在循環使用並不是所有的DataFramedf與列price，但組price與price柱：

import pandas as pd 
import numpy as np 

np.random.seed(1) 
df = pd.DataFrame(np.random.randint(10, size=(5,3))) 
df.columns = ['Product Id','group','price'] 
print df 
    Product Id group price 
0   5  8  9 
1   5  0  0 
2   1  7  6 
3   9  2  4 
4   5  2  4 

for group, price in df.groupby(['group']): 
    print np.percentile(df['price'],60) 
4.8 
4.8 
4.8 
4.8 
group 

for group, price in df.groupby(['group']): 
    print np.percentile(price['price'],60) 
0.0 
4.0 
6.0 
9.0

爲np.percentile另一種解決方案，其中輸出Serie：

print df.groupby(['group'])['price'].apply(lambda x: np.percentile(x,60)) 
group 
0 0.0 
2 4.0 
7 6.0 
8 9.0 
Name: price, dtype: float64

解決方案與DataFrameGroupBy.quantile：

print df.groupby(['group'])['price'].quantile(.6) 
group 
0 0.0 
2 4.0 
7 6.0 
8 9.0 
Name: price, dtype: float64

編輯的評論：

如果你需要新列使用transform，docs：

>>> np.random.seed(1) 
>>> df = pd.DataFrame(np.random.randint(10,size=(20,3))) 
>>> df.columns = ['Product Id','group','price'] 
>>> df 
    Product Id group price 
0   5  8  9 
1   5  0  0 
2   1  7  6 
3   9  2  4 
4   5  2  4 
5   2  4  7 
6   7  9  1 
7   7  0  6 
8   9  9  7 
9   6  9  1 
10   0  1  8 
11   8  3  9 
12   8  7  3 
13   6  5  1 
14   9  3  4 
15   8  1  4 
16   0  3  9 
17   2  0  4 
18   9  2  7 
19   7  9  8 
>>> df['percentil'] = df.groupby(['group'])['price'].transform(lambda x: x.quantile(.6))

>>> df 
    Product Id group price percentil 
0   5  8  9  9.0 
1   5  0  0  4.4 
2   1  7  6  4.8 
3   9  2  4  4.6 
4   5  2  4  4.6 
5   2  4  7  7.0 
6   7  9  1  5.8 
7   7  0  6  4.4 
8   9  9  7  5.8 
9   6  9  1  5.8 
10   0  1  8  6.4 
11   8  3  9  9.0 
12   8  7  3  4.8 
13   6  5  1  1.0 
14   9  3  4  9.0 
15   8  1  4  6.4 
16   0  3  9  9.0 
17   2  0  4  4.4 
18   9  2  7  4.6 
19   7  9  8  5.8

來源

2016-04-29 18:18:56 jezrael

不知道這是否符合我的目的。我不需要打印輸出。我想在每個組的第60百分位的同一數據框df中創建一個「百分位數」列。這意味着我的DF將有4列，產品ID，價格，組和百分位數。在下一步中，我希望使用這個新的「百分位」創建另一列，以便我可以按「價格」對每個「組」中的產品ID進行分類。我的下一行是df ['price_point'] = np.where（df ['retailprice']> = k，'high'，'low'） – Anu

回答已編輯，請檢查。 – jezrael

是的，工作完美！ – Anu

你可以嘗試熊貓quantile

df[['group', 'price']].groupby('group').quantile(.6)

給定分位數在請求軸上的返回值，百分比。百分位數。

來源

2016-04-29 18:27:58 Sam

計算特定羣體的百分位數

回答

相關問題