在Python中擬合裝倉對數正態數據

我有一系列按百分比體積分數排列的粒度分佈數據，如下所示：在Python中擬合裝倉對數正態數據

size % 
6.68 0.05 
9.92 1.15 
etc.

我需要適應這個數據對數正態分佈，我準備用python的stats.lognorm.fit函數來完成，但這似乎期望輸入作爲個變量而不是離散化數據的陣列，由什麼我判斷ve read。

我打算使用for循環遍歷數據和.extend每個大小條目到佔位符數組所需的次數，以創建一個數組，其中包含與分箱數據相對應的變量列表。

雖然這看起來非常醜陋，效率低下，並且可能是一種簡單的方法。有沒有辦法將分檔數據輸入到stats.lognorm.fit函數中？

來源

2017-02-10 Sam Robinson

我跳到你列了累計百分比得出錯誤的結論。 –

我想一個可能的解決方法是手動將pdf適合您的bin數據，假設x值是每個區間的中點，y值是相應的bin頻率。然後使用scipy.optimize.curve_fit擬合基於x和y值的曲線。我認爲結果的準確性將取決於您擁有的垃圾箱數量。一個例子如下：

import matplotlib.pyplot as plt 
from scipy.optimize import curve_fit 
import numpy as np 

def pdf(x, mu, sigma): 
    """pdf of lognormal distribution""" 

    return (np.exp(-(np.log(x) - mu)**2/(2 * sigma**2))/(x * sigma * np.sqrt(2 * np.pi))) 

mu, sigma = 3., 1.        # actual parameter value 

data = np.random.lognormal(mu, sigma, size=1000)  # data generation 
h = plt.hist(data, bins=30, normed = True) 

y = h[0]          # frequencies for each bin, this is y value to fit 
xs = h[1]          # boundaries for each bin 
delta = xs[1] - xs[0]       # width of bins 
x = xs[:-1] + delta/       # midpoints of bins, this is x value to fit 

popt, pcov = curve_fit(pdf, x, y, p0=[1, 1]) # data fitting, popt contains the fitted parameters 
print(popt) 
# [ 3.13048122 1.01360758]      fitting results 

fig, ax = plt.subplots() 
ax.hist(data, bins=30, normed=True, align='mid', label='Histogram') 
xr = np.linspace(min(xs), max(xs), 10000) 
yr = pdf(xr, mu, sigma) 
yf = pdf(xr, *popt) 
ax.plot(xr, yr, label="Actual") 
ax.plot(xr, yf, linestyle = 'dashed', label="Fitted") 
ax.legend()

來源

2017-02-11 05:59:08

感謝你們倆。 –

在Python中擬合裝倉對數正態數據

回答

相關問題