核密度估計julia

我正在嘗試實現核密度估計。然而，我的代碼並沒有提供它應該的答案。它也寫在茱莉亞，但代碼應該是自我解釋。核密度估計julia

這裏是算法：

$\$ f(x) = \frac{1}{n*h} * \sum_{i = 1}^n K(\frac{x - X_i}{h}) \$$

其中

$\$ K(u) = 0.5*I(|u| <= 1)\$ with \$ u = \frac{x - X_i}{h}\$$

所以算法測試是否x和觀察X_I由一些常數因數（binwidth）加權之間的距離少於一個。如果是這樣，它將0.5 /（n * h）分配給該值，其中n =觀測值的＃個。

這是我實現：

#Kernel density function. 
#Purpose: estimate the probability density function (pdf) 
#of given observations 
#@param data: observations for which the pdf should be estimated 
#@return: returns an array with the estimated densities 

function kernelDensity(data) 
| 
| #Uniform kernel function. 
| #@param x: Current x value 
| #@param X_i: x value of observation i 
| #@param width: binwidth 
| #@return: Returns 1 if the absolute distance from 
| #x(current) to x(observation) weighted by the binwidth 
| #is less then 1. Else it returns 0. 
| 
| function uniformKernel(x, observation, width) 
| | u = (x - observation)/width 
| | abs (u) <= 1 ? 1 : 0 
| end 
| 
| #number of observations in the data set 
| n = length(data) 
| 
| #binwidth (set arbitraily to 0.1 
| h = 0.1 
| 
| #vector that stored the pdf 
| res = zeros(Real, n) 
| 
| #counter variable for the loop 
| counter = 0 
| 
| #lower and upper limit of the x axis 
| start = floor(minimum(data)) 
| stop = ceil (maximum(data)) 
| 
| #main loop 
| #@linspace: divides the space from start to stop in n 
| #equally spaced intervalls 
| for x in linspace(start, stop, n) 
| | counter += 1 
| | for observation in data 
| | | 
| | | #count all observations for which the kernel 
| | | #returns 1 and mult by 0.5 because the 
| | | #kernel computed the absolute difference which can be 
| | | #either positive or negative 
| | | res[counter] += 0.5 * uniformKernel(x, observation, h) 
| | end 
| | #devide by n times h 
| | res[counter] /= n * h 
| end 
| #return results 
| res 
end 
#run function 
#@rand: generates 10 uniform random numbers between 0 and 1 
kernelDensity(rand(10))

，這被返回：

> 0.0 
> 1.5 
> 2.5 
> 1.0 
> 1.5 
> 1.0 
> 0.0 
> 0.5 
> 0.5 
> 0.0

其總和爲：（累計發佈包功能應該是1）8.5

所以有兩個錯誤：

這些值未正確縮放。每個數字應該是其當前值的十分之一。事實上，如果觀察由增加數量10^NN = 1，2，...則CDF也由10^N

例如增加：

> kernelDensity(rand(1000)) 
> 953.53

它們不總計爲10（或者如果它不是縮放誤差，則爲1）。隨着樣本量的增加，誤差會變得更加明顯：未包括5％的觀測值。

我相信我實現了公式1：1，因此我真的不明白錯誤在哪裏。

來源

2015-09-04 Vincent

我不是KDEs的專家，所以採取這一切與一粒鹽，而是一個非常類似（但要快得多！）執行你的代碼是：

function kernelDensity{T<:AbstractFloat}(data::Vector{T}, h::T) 
    res = similar(data) 
    lb = minimum(data); ub = maximum(data) 
    for (i,x) in enumerate(linspace(lb, ub, size(data,1))) 
    for obs in data 
     res[i] += abs((obs-x)/h) <= 1. ? 0.5 : 0. 
    end 
    res[i] /= (n*h) 
end 
sum(res) 
end

如果我我沒有錯，密度估計應該整合到1，也就是說我們預計kernelDensity(rand(100), 0.1)/100至少接近1.在上面的實現中，我到達那裏，給出或拿5％，但是然後我們不再知道0.1是最佳帶寬（使用h=0.135而不是我在那裏達到0.1％），並且已知統一內核只有大約93％「有效率」。

在任何情況下，有一個很好的核密度封裝朱莉婭可用here，所以你應該只是做Pkg.add("KernelDensity")而不是試圖編寫自己的Epanechnikov內核:)

來源

2015-09-04 11:04:11

謝謝你，代碼和庫。沒有找到它。 – Vincent