將數組轉換爲百分位數

我有一個數組，我想要轉換爲百分位數。例如，假設我有一個通常分佈式陣列：將數組轉換爲百分位數

import numpy as np 
import matplotlib.pyplot as plt 

arr = np.random.normal(0, 1, 1000) 
plt.hist(arr)

對於陣列中的每個值，我要計算該值的百分位數（例如，0是上述分佈的第50百分位數所以0 - > 0.5）。結果應該是均勻分佈的，因爲每個百分位數應該具有相同的權重。

我發現np.percentile但這個函數返回給出一個數組和分位數，我需要的是返回給出一個數組和值分位數的值。

有沒有比較有效的方法來做到這一點？

來源

2017-06-17 Chris

from scipy.stats import percentileofscore 

# generate example data 
arr = np.random.normal(0, 1, 10) 

# pre-sort array 
arr_sorted = sorted(arr) 

# calculate percentiles using scipy func percentileofscore on each array element 
s = pd.Series(arr) 
percentiles = s.apply(lambda x: percentileofscore(arr_sorted, x))

檢查的結果是正確的：

df = pd.DataFrame({'data': s, 'percentiles': percentiles})  
df.sort_values(by='data') 

     data pcts 
3 -1.692881 10.0 
8 -1.395427 20.0 
7 -1.162031 30.0 
6 -0.568550 40.0 
9 0.047298 50.0 
5 0.296661 60.0 
0 0.534816 70.0 
4 0.542267 80.0 
1 0.584766 90.0 
2 1.185000 100.0

來源

2017-06-17 18:10:38

這裏的另一種方法。我想你在問估計概率積分變換。這段代碼產生了一個相當細緻的估計，即inverted_edf。

它通過以不同的值計算SAMPLE中點之間的線性內插來進行。然後它計算樣本經驗df，最後是inverted_edf。

我應該提到，即使樣本量爲1,000，尾巴的百分位數也會有相當大的統計變異性，儘管0.5的樣本量會少一些。

import statsmodels.distributions.empirical_distribution as edf 
from scipy.interpolate import interp1d 
import numpy as np 
import matplotlib.pyplot as plt 

SAMPLE = np.random.normal(0, 1, 1000) 
sample_edf = edf.ECDF(SAMPLE) 

slope_changes = sorted(set(SAMPLE)) 

sample_edf_values_at_slope_changes = [ sample_edf(item) for item in slope_changes] 
inverted_edf = interp1d(sample_edf_values_at_slope_changes, slope_changes) 

x = np.linspace(0.005, 1) 
y = inverted_edf(x) 
#~ plt.plot(x, y, 'ro', x, y, 'b-') 
plt.plot(x, y, 'b-') 
plt.show() 

p = 0.5 
print ('%s percentile:' % (100*p), inverted_edf(p))

下面是兩次運行的圖形和文本輸出。

50.0 percentile: -0.05917394517540461 
50.0 percentile: -0.0034011090849578695

來源

2017-06-17 18:47:17

將數組轉換爲百分位數

回答

相關問題