計算從numpy數字化的垃圾桶的百分位數？

我有一組數據，並用於創建倉一組閾值：計算從numpy數字化的垃圾桶的百分位數？

data = np.array([0.01, 0.02, 1, 1, 1, 2, 2, 8, 8, 4.5, 6.6]) 
thresholds = np.array([0,5,10]) 
bins = np.digitize(data, thresholds, right=True)

對於每個元件的在bins，我想知道基地百分。例如，在bins中，最小的箱應該從第0百分位開始。然後下一個bin，例如，第20個百分點。因此，如果data中的值落在data的第0和第20百分位之間，則它屬於第一個bin。

我已經看過熊貓rank(pct=True)，但似乎無法正確完成此操作。

對此提出建議？

來源

2016-09-03 BobbyJohnsonOG

您可以按照先前的StackOverflow問題（Map each list value to its corresponding percentile）中所述計算數據數組中每個元素的百分位數。

import numpy as np 
from scipy import stats 
data = np.array([0.01, 0.02, 1, 1, 1, 2, 2, 8, 8, 4.5, 6.6])

方法1：使用scipy.stats.percentileofscore：

data_percentile = np.array([stats.percentileofscore(data, a) for a in data]) 
data_percentile 
Out[1]: 
array([ 9.09090909, 18.18181818, 36.36363636, 36.36363636, 
     36.36363636, 59.09090909, 59.09090909, 95.45454545, 
     95.45454545, 72.72727273, 81.81818182])

方法2：使用scipy.stats.rankdata和正火至100（快）：

ranked = stats.rankdata(data) 
data_percentile = ranked/len(data)*100 
data_percentile 
Out[2]: 
array([ 9.09090909, 18.18181818, 36.36363636, 36.36363636, 
     36.36363636, 59.09090909, 59.09090909, 95.45454545, 
     95.45454545, 72.72727273, 81.81818182])

現在，你有百分的列表，你可以像以前一樣使用它們numpy.digitize：

bins_percentile = [0,20,40,60,80,100] 
data_binned_indices = np.digitize(data_percentile, bins_percentile, right=True) 
data_binned_indices 
Out[3]: 
array([1, 1, 2, 2, 2, 3, 3, 5, 5, 4, 5], dtype=int64)

這會根據您選擇的百分比列表的指數爲您提供分箱數據。如果需要，您還可以使用numpy.take返回實際（上限）百分位數：

data_binned_percentiles = np.take(bins_percentile, data_binned_indices) 
data_binned_percentiles 
Out[4]: 
array([ 20, 20, 40, 40, 40, 60, 60, 100, 100, 80, 100])

來源

2016-09-04 10:45:21

計算從numpy數字化的垃圾桶的百分位數？

回答

相關問題