使用kde2d（R）和ksdensity2d（Matlab）生成的二維KDE的差異

嘗試將一些代碼從Matlab移植到R時，我遇到了一個問題。代碼的要點是產生一個二維核密度估計，然後使用估計做一些簡單的計算。在Matlab中，KDE計算使用函數ksdensity2d.m完成。在R中，KDE計算是使用MASS包中的kde2d完成的。因此，我們可以說我想計算KDE並添加這些值（這不是我打算做的，但它是用於此目的）。在該R，這可以通過使用kde2d（R）和ksdensity2d（Matlab）生成的二維KDE的差異

library(MASS) 
    set.seed(1009) 
    x <- sample(seq(1000, 2000), 100, replace=TRUE) 
    y <- sample(seq(-12, 12), 100, replace=TRUE) 
    kk <- kde2d(x, y, h=c(30, 1.5), n=100, lims=c(1000, 2000, -12, 12)) 
    sum(kk$z)

這給出了答案0.3932732來完成。當在Matlab中使用ksdensity2d使用相同的確切數據和條件時，答案是0.3768。通過觀察對kde2d代碼我注意到，帶寬由4

kde2d <- function (x, y, h, n = 25, lims = c(range(x), range(y))) 
    { 
    nx <- length(x) 
    if (length(y) != nx) 
    stop("data vectors must be the same length") 
    if (any(!is.finite(x)) || any(!is.finite(y))) 
    stop("missing or infinite values in the data are not allowed") 
    if (any(!is.finite(lims))) 
    stop("only finite values are allowed in 'lims'") 
    n <- rep(n, length.out = 2L) 
    gx <- seq.int(lims[1L], lims[2L], length.out = n[1L]) 
    gy <- seq.int(lims[3L], lims[4L], length.out = n[2L]) 
    h <- if (missing(h)) 
    c(bandwidth.nrd(x), bandwidth.nrd(y)) 
    else rep(h, length.out = 2L) 
    if (any(h <= 0)) 
    stop("bandwidths must be strictly positive") 
    h <- h/4 
    ax <- outer(gx, x, "-")/h[1L] 
    ay <- outer(gy, y, "-")/h[2L] 
    z <- tcrossprod(matrix(dnorm(ax), , nx), matrix(dnorm(ay), 
    , nx))/(nx * h[1L] * h[2L]) 
    list(x = gx, y = gy, z = z) 
    }

一個簡單的檢查分爲看看帶寬的差異是在結果差異的原因，然後

kk <- kde2d(x, y, h=c(30, 1.5)*4, n=100, lims=c(1000, 2000, -12, 12)) 
    sum(kk$z)

它給出了0.3768013（這與Matlab的答案相同）。

所以我的問題是：爲什麼kde2d將帶寬除以四？（或者爲什麼不ksdensity2d？）

來源

2015-06-03 mkr

在鏡像github source，行31-35：

if (any(h <= 0)) 
    stop("bandwidths must be strictly positive") 
h <- h/4       # for S's bandwidth scale 
ax <- outer(gx, x, "-")/h[1L] 
ay <- outer(gy, y, "-")/h[2L]

和kde2d()幫助文件，這表明看幫助文件bandwidth。這說：

...這些都被縮放到密度的寬度參數，所以給予答案四倍大。

但是爲什麼？

density()表示width論點是爲了與S（R的前身）兼容而存在的。在source爲density()的意見閱讀：

## S has width equal to the length of the support of the kernel 
## except for the gaussian where it is 4 * sd. 
## R has bw a multiple of the sd.

默認爲高斯之一。當參數bw未指定且width爲width時，例如，

library(MASS) 

set.seed(1) 
x <- rnorm(1000, 10, 2) 
all.equal(density(x, bw = 1), density(x, width = 4)) # Only the call is different

然而，由於kde2d()顯然寫入保持帶S兼容（我想這本來是寫給S，鑑於它的質量分數），一切都結束了除以4。在翻閱本書的相關部分（約126頁）之後，似乎他們可能選擇了4個來平衡數據的平滑性和保真度。

總之，我的猜測是，kde2d()除以四，保持一致與MASS（和其他的東西最初是爲書面）的休息，你要去事情的方式看起來很好。

來源

2015-06-04 03:24:09 alexforrence

使用kde2d（R）和ksdensity2d（Matlab）生成的二維KDE的差異

回答

相關問題