2015-05-20 73 views
0

我想分析馬拉松數據。我建立一個簡單的模型,並創造了一個決策樹:R中的決策樹公式

fit <- rpart(timeCategory ~ country + age.group + participated.times, data=data) 

我的目標是創建一個通用的公式預測結果,like in this article (page 4)enter image description here

我該如何在R中做到這一點,使用什麼技術?因此,我希望有一個提供屬性作爲刺痛的公式。

數據:我使用的一些真實數據可能是downloaded here。 讀取數據如下:

data = read.table("data/processedData.txt", header=T) 
data$timeCategory <- ntile(data$time, 10) 
+0

看看'預測。 rpart'。看起來語法是'predict(fit,newdata)'。如果你想輸入圖像中的函數,你必須從'fit'中提取係數並進行一些字符串操作。 – Frank

+0

@Frank你能提供更多的細節還是一些虛擬的例子?我不明白。 – Bob

+0

@Bob:你有責任提供這個例子。 –

回答

1

這些是使用時間作爲連續值,這是在本例中被提供預測的類型的迴歸係數。它們可以用來構建您要求的配方類型。

> lmfit <- lm(time ~ country + age.group + particip.time, data=data) 
> lmfit 

Call: 
lm(formula = time ~ country + age.group + particip.time, data = data) 

Coefficients: 
     (Intercept)  countryJõgeva countryLääne-Viru  countryLäti 
     9526.702   345.930   122.513   -73.239 
    countryLeedu  countryPärnu  countryRapla countrySaaremaa 
      120.592   -78.086   -208.882   114.292 
    countryTallinn  countryTartu countryViljandi  age.groupM20 
      -37.536    55.771   -70.417   -142.600 
    age.groupM21  age.groupM35  age.groupM40  age.groupM45 
     -218.225   -218.067   -20.108   -196.331 
    age.groupM50  particip.time 
      88.342    -2.487 

如果你想他們都排着隊則:

> as.matrix(coef(lmfit)) 
         [,1] 
(Intercept)  9526.702146 
countryJõgeva  345.930334 
countryLääne-Viru 122.513294 
countryLäti  -73.239333 
countryLeedu  120.591585 
countryPärnu  -78.086107 
countryRapla  -208.882244 
countrySaaremaa 114.291592 
countryTallinn  -37.535659 
countryTartu  55.771326 
countryViljandi -70.416659 
age.groupM20  -142.599598 
age.groupM21  -218.224754 
age.groupM35  -218.066655 
age.groupM40  -20.108242 
age.groupM45  -196.331263 
age.groupM50  88.341978 
particip.time  -2.486818 

進行進一步加工的文字:

> form <- as.matrix(coef(lmfit)) 
> rownames(form) <- gsub("try", "try == ", rownames(form)) 
> rownames(form) <- gsub("oup", "oup == ", rownames(form)) 
> form 
          [,1] 
(Intercept)   9526.702146 
country == Jõgeva  345.930334 
country == Lääne-Viru 122.513294 
country == Läti  -73.239333 
country == Leedu  120.591585 
country == Pärnu  -78.086107 
country == Rapla  -208.882244 
country == Saaremaa 114.291592 
country == Tallinn  -37.535659 
country == Tartu  55.771326 
country == Viljandi -70.416659 
age.group == M20  -142.599598 
age.group == M21  -218.224754 
age.group == M35  -218.066655 
age.group == M40  -20.108242 
age.group == M45  -196.331263 
age.group == M50  88.341978 
particip.time   -2.486818 

幾乎完全:

cat(paste(form, paste0("(", rownames(form), ")"), sep="*", collapse="+\n")) 

9526.70214596473*((Intercept))+ 
345.93033373724*(country == Jõgeva)+ 
122.51329418344*(country == Lääne-Viru)+ 
-73.2393326763322*(country == Läti)+ 
120.591584530399*(country == Leedu)+ 
-78.0861070429056*(country == Pärnu)+ 
-208.882244416016*(country == Rapla)+ 
114.291592299937*(country == Saaremaa)+ 
-37.5356589458207*(country == Tallinn)+ 
55.771326363022*(country == Tartu)+ 
-70.4166587941724*(country == Viljandi)+ 
-142.599598141679*(age.group == M20)+ 
-218.224754448193*(age.group == M21)+ 
-218.066655292225*(age.group == M35)+ 
-20.1082422022072*(age.group == M40)+ 
-196.33126335145*(age.group == M45)+ 
88.3419781798024*(age.group == M50)+ 
-2.48681789339678*(particip.time)