我是R的新手,我使用glmer來適應幾個二項模型,我只需要它們調用predict
即可使用所得到的概率。不過,我有一個非常大的數據集,甚至只是一個模型的大小變得非常大:減少glmer模型的大小
> library(pryr)
> object_size(mod)
701 MB
模型的係數的大小相比簡直是小巫見大巫:
> object_size(coef(mod))
1.16 MB
一樣大小的擬合值:
> object_size(fitted(mod))
25.6 MB
首先,我不明白爲什麼模型的對象大小如此之大。它似乎包含用於擬合模型的原始數據框架,但即使這樣也不能說明尺寸。爲什麼這麼巨大?
其次,是否有可能將模型剝離爲只需調用預測的部分?如果是這樣,我將如何去做這件事?我發現一個帖子,這是爲glm
在http://blog.yhathq.com/posts/reducing-your-r-memory-footprint-by-7000x.html完成,但似乎glmer模型訪問不同,並有不同的組件。
任何幫助將不勝感激。
編輯:
挖掘到模型的內部:
> object_size(getME(mod, "X"))
205 MB
> object_size(getME(mod, "Z"))
36.9 MB
> object_size(getME(mod, "Zt"))
38.4 MB
> object_size(getME(mod, "Ztlist"))
41.6 MB
> object_size(getME(mod, "mmList"))
38.4 MB
> object_size(getME(mod, "y"))
3.2 MB
> object_size(getME(mod, "mu"))
3.2 MB
> object_size(getME(mod, "u"))
18.4 kB
> object_size(getME(mod, "b"))
19.5 kB
> object_size(getME(mod, "Gp"))
56 B
> object_size(getME(mod, "Tp"))
472 B
> object_size(getME(mod, "L"))
15.5 MB
> object_size(getME(mod, "Lambda"))
38.1 kB
> object_size(getME(mod, "Lambdat"))
38.1 kB
> object_size(getME(mod, "Lind"))
9.22 kB
> object_size(getME(mod, "Tlist"))
936 B
> object_size(getME(mod, "A"))
38.4 MB
> object_size(getME(mod, "RX"))
30.3 kB
> object_size(getME(mod, "RZX"))
1.05 MB
> object_size(getME(mod, "sigma"))
48 B
> object_size(getME(mod, "flist"))
4.89 MB
> object_size(getME(mod, "fixef"))
4.5 kB
> object_size(getME(mod, "beta"))
496 B
> object_size(getME(mod, "theta"))
472 B
> object_size(getME(mod, "ST"))
936 B
> object_size(getME(mod, "REML"))
48 B
> object_size(getME(mod, "is_REML"))
48 B
> object_size(getME(mod, "n_rtrms"))
48 B
> object_size(getME(mod, "n_rfacs"))
48 B
> object_size(getME(mod, "N"))
256 B
> object_size(getME(mod, "n"))
256 B
> object_size(getME(mod, "p"))
256 B
> object_size(getME(mod, "q"))
256 B
> object_size(getME(mod, "p_i"))
408 B
> object_size(getME(mod, "l_i"))
408 B
> object_size(getME(mod, "q_i"))
408 B
> object_size(getME(mod, "mod"))
48 B
> object_size(getME(mod, "m_i"))
424 B
> object_size(getME(mod, "m"))
48 B
> object_size(getME(mod, "cnms"))
624 B
> object_size(getME(mod, "devcomp"))
2.21 kB
> object_size(getME(mod, "offset"))
3.2 MB
> get_obj_size([email protected], "RC")
[,1]
family 673355488
initialize 673355488
initialize#lmResp 673355488
ptr 673355488
resDev 673355488
updateMu 673355488
updateWts 673355488
wrss 673355488
eta 3196024
mu 3196024
n 3196024
offset 3196024
sqrtrwt 3196024
sqrtXwt 3196024
weights 3196024
wtres 3196024
y 3196024
Ptr 40
> get_obj_size([email protected], "RC")
[,1]
beta 449419408
initialize 449419408
initializePtr 449419408
ldL2 449419408
ldRX2 449419408
linPred 449419408
ptr 449419408
setTheta 449419408
sqrL 449419408
u 449419408
X 204549128
V 182171288
Ut 38448168
Zt 38448168
LamtUt 38353248
Xwts 3196024
RZX 1047176
Lambdat 38136
VtV 26192
delu 18408
u0 18408
Utr 18408
Lind 9224
beta0 496
delb 496
Vtr 496
theta 72
Ptr 40
這是一個很好的問題,但是我們可以請一個小型的可重複使用的例子,或者至少是關於模型尺寸的一些信息(模型框架的尺寸,固定效應係數的數量等等)? –
Hey Ben, 讚我提到過,我是新來的使用R,所以我希望這個信息是你的意思: 用於擬合模型的數據框的大小爲399,498行×14列。有(我相信)57固定效應係數,雖然我不完全確定如何得到。當我在fixef(mod)上調用'dim'時,它返回NULL,但是隻計算它們的數量似乎是57.我可以深入瞭解它並獲取所需的任何信息,但是我可能需要一些關於如何訪問該信息的指針。它需要原始數據框還是可以像glm一樣去除它? –
此外,我會對可重現的例子感興趣,但我想這對於這個目的來說太大了。它可以在cbpp數據上完成嗎? –