分解線性模型 - 使用一個因子創建lm

此問題是this one的更具體和簡化的版本。分解線性模型 - 使用一個因子創建lm

我正在使用的數據集對於單個lm或speedlm計算而言太大。
我想將我的數據集分成較小的部分，但在此過程中，一列（或多列）只包含一個factor。
下面的代碼是重現我的例子的最小值。在問題的底部，我會把我的測試腳本放在那些感興趣的人身上。

library(speedglm) 

iris$Species <- factor(iris$Species) 
i <- iris[1:20,] 
summary(i) 
speedlm(Sepal.Length ~ Sepal.Width + Species , i)

這讓我以下錯誤：

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
    contrasts can be applied only to factors with 2 or more levels

我試圖因式分解iris$Species但沒有成功。我真的不知道我現在如何解決這個問題。

如何將Species添加到模型中？（不增加樣本量）

編輯：
我知道我只有一個級別：「setosa」但我仍然需要它列入線性模型，因爲我最終會更新更多的因素模型，下面

示例腳本看到對於那些有興趣，這裏是什麼，我會用我的實際數據集的一個示例腳本：

library(speedglm) 

testfunction <- function(start.i, end.i) { 
    return(iris[start.i:end.i,]) 
} 

    lengthdata <- nrow(iris) 
    stepsize <- 20 

## attempt to factor 
    iris$Species <- factor(iris$Species) 

## Creates the iris dataset in split parts 
    start.i <- seq(0, lengthdata, stepsize) 
    end.i <- pmin(start.i + stepsize, lengthdata) 

    dat <- Map(testfunction, start.i + 1, end.i) 

## Loops trough the split iris data 
    for (i in dat) { 
    if (!exists("lmfit")) { 
     lmfit <- speedlm(Sepal.Length ~ Sepal.Width + Species , i) 
    } else if (!exists("lmfit2")) { 
     lmfit2 <- updateWithMoreData(lmfit, i) 
    } else { 
     lmfit2 <- updateWithMoreData(lmfit2, i) 
    } 
    } 
    print(summary(lmfit2))

來源

2015-10-15 Bas

可能存在是一種更好的方式，但是如果你重新排序你的行，每個分割將包含更多的級別，因此不會導致錯誤。我創建了一個隨機順序，但你可能想要做一個更系統的方法。

library(speedglm) 

testfunction <- function(start.i, end.i) { 
    return(iris.r[start.i:end.i,]) 
} 

lengthdata <- nrow(iris) 
stepsize <- 20 

## attempt to factor 
iris$Species <- factor(iris$Species) 

##Random order 
set.seed(1) 
iris.r <- iris[sample(nrow(iris)),] 

## Creates the iris dataset in split parts 
start.i <- seq(0, lengthdata, stepsize) 
end.i <- pmin(start.i + stepsize, lengthdata) 

dat <- Map(testfunction, start.i + 1, end.i) 

## Loops trough the split iris data 
for (i in dat) { 
    if (!exists("lmfit")) { 
     lmfit <- speedlm(Sepal.Length ~ Sepal.Width + Species , i) 
    } else if (!exists("lmfit2")) { 
     lmfit2 <- updateWithMoreData(lmfit, i) 
    } else { 
     lmfit2 <- updateWithMoreData(lmfit2, i) 
    } 
} 
print(summary(lmfit2))

編輯取而代之的是隨機的順序，你可以用模除法來生成一個系統的方式spred出索引向量：

spred.i <- seq(1, by = 7, length.out = 150) %% 150 + 1 
iris.r <- iris[spred.i,]

來源

2015-10-15 10:27:50 JohannesNE

分解線性模型 - 使用一個因子創建lm

回答

相關問題