應用功能GROUPBY在python熊貓

我有一個數據幀，看起來像以下對象：應用功能GROUPBY在python熊貓

id salary days_employed category salary_percentile 
1 200000   400   1    14

其中0類表示自己是一個早期半途而廢和1說，他一直保持更長的時間。

我的代碼如下：

df1['salary_percentile'] = pd.qcut(df1['salary'], 50, labels=['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50'])

切割INTP 50水桶和檢查落在第37 salary_percentile行之後，這是我的數據框的樣子： [在這裏輸入的形象描述] [ 2] [2]

def f(x): 
    early_quitter = x.loc[(x.category== '0')] 
    non = x.loc[(x.category == '1')] 
    proportion_early_quitters = early_quitter.shape[0]/x.shape[0] 
    return pd.Series({'prop_early_quitters': proportion_early_quitters}) 

bypercentile = df1.groupby('salary_percentile').apply(f) 
bypercentile = bypercentile.reset_index(level='None') 
bypercentile

我希望我的函數返回一個包含early_quitters的每一個組中的比例數據幀。即在每個組中，我想計算（len（early_quitter）/ len（group））。當我使用這個函數時，它爲每個組返回一個0比例的數據幀。

有人可以幫助我嗎？

在旁註中，我使用上面顯示的代碼創建了salary_percentile列。

謝謝！

來源

2016-12-22 Gingerbread

定義的變量是你得到這個使用Python 2？如果是這樣，請嘗試將'from __future__ import division'放在代碼的開頭。 – BrenBarn

非常感謝！它爲我工作！我確實使用Python 2！再次感謝！！ – Gingerbread

首先，你得到零的原因是因爲len返回一個整數，當你在python 2中完成整數除以整數時，你會得到一個整數，它的值是帶有十進制分量的除法結果。所以「一些小於n的正數」/ n等於零。你可以用float(len(early_quitter))/len(group)

解決這個問題。然而，如果及早戒菸都以0標記，否則爲1，早戒菸的比例

float(len(early_quitters))/len(group)

或者

1 - float(len(not_early_quitters))/len(group)

還是因爲這些值是len產生與sum相同的值

1 - sum(not_early_quitters)/len(group)

然而，這是not_early_quitters的group內平均的定義。所以

1 - mean(early_quitters)

您應該能夠從您與

1 - df1.groupby('salary_percentile').category.mean()

來源

2016-12-22 21:20:50 piRSquared

我不認爲這是我正在尋找的。我已經編輯了一下我的問題。你能幫我編輯一下版本嗎？ – Gingerbread

你想計算0的比例爲什麼列？ – piRSquared

你有我可以使用的樣本數據嗎？你提供了一行，並告訴我們你正在將它切成50個桶。 – piRSquared

應用功能GROUPBY在python熊貓

回答

相關問題