2012-10-23 69 views
2

我有一個R文件,它導入一個文件,執行一些數據操作並執行邏輯迴歸模型,然後將這些結果保存到一個txt文件中。但是,當我從命令行運行該文件時,出現以下錯誤消息,不知道發生了什麼。從命令行運行R文件時出錯

[email protected]:~/Downloads$ R --no-save <Auto_Model.r> out.txt 
Warning message: 
NAs introduced by coercion 
Error in if (x == "\\N") NA else if (x > 1 & x < 6999) "1:6999" else if (x > : 
    missing value where TRUE/FALSE needed 
Calls: bin.value -> do.call -> mapply -> .Call -> <Anonymous> 
Execution halted 
[email protected]:~/Downloads$ R --no-save < Auto_Model.r 

將R腳本,導致誤差小於=

> ## IMPORT DATA: 
> #setwd("~/Desktop") 
> library(foreign) 
> dat = read.csv("dat.csv", stringsAsFactors=FALSE) 
> 
> ## zipcode = 
> dat$zipcode = as.character(dat$zipcode) 
> 
> bin.value = Vectorize(function(x) { 
+ if (x == "\\N") NA 
+ else if (x > 1 & x < 6999) "1:6999" 
+ else if (x > 7000 & x < 9999) "7000:9999" 
+ else if (x > 10000 & x < 14849) "10000:14849" 
+ else if (x > 14850 & x < 19699) "14850:19699" 
+ else if (x > 19700 & x < 29999) "19700:29999" 
+ else if (x > 30000 & x < 31999) "30000:31999" 
+ else if (x > 32000 & x < 34999) "32000:34999" 
+ else if (x > 35000 & x < 42999) "35000:42999" 
+ else if (x > 43000 & x < 49999) "43000:49999" 
+ else if (x > 50000 & x < 59999) "50000:59999" 
+ else if (x > 60000 & x < 69999) "60000:69999" 
+ else if (x > 70000 & x < 79999) "70000:79999" 
+ else if (x > 80000 & x < 89999) "80000:89999" 
+ else if (x > 90000 & x < 96999) "90000:96999" 
+ else if (x > 97000 & x < 99820) "97000:99820" 
+ else NA 
+ }) 
> 
> dat$zipcode2 = as.character(bin.value(as.integer(dat$zipcode))) 
Error in if (x == "\\N") NA else if (x > 1 & x < 6999) "1:6999" else if (x > : 
    missing value where TRUE/FALSE needed 
Calls: bin.value -> do.call -> mapply -> .Call -> <Anonymous> 
Execution halted 

我認爲有些是錯的,我怎麼試圖操縱郵編變量,但沒有我試過的模式似乎解決這個問題。

> str(dat$zipcode) 
int [1:12635] 76148 33825 61832 11368 98290 92078 44104 62052 55106 20861 ... 
> 

回答

3

在我看來,你想做什麼已經由功能cut完成:

bin.value <- function(x){ 
    cut(as.integer(x), 
     breaks= c(1,6999,9999,14849,19699,29999,31999,34999,42999,49999,59999,69999,79999,89999,96999,99820), 
     labels= c("1:6999", "7000:9999", "10000:14849", "14850:19699", "19700:29999", "30000:31999", "32000:34999", "35000:42999", "43000:49999", "50000:59999", "60000:69999", "70000:79999", "80000:89999", "90000:96999", "97000:99820")) 
    } 

否則您的具體問題是由as.integer造成的:

a <- c("\\N",sample(seq(0,100000,by=1),10)) 
a 
[1] "\\N" "38987" "50403" "75683" "66706" "27924" "17216" "77539" "80658" "2335" "53010" 
as.integer(a) 
[1] NA 38987 50403 75683 66706 27924 17216 77539 80658 2335 53010 

\\N因此被直接引用爲NA,您的循環僅在最後處理,同時全部爲if語句嘗試將缺少的值與某些元素進行比較。

as.integer(a)[1]=="\\N" 
[1] NA # Instead of TRUE or FALSE 
+0

謝謝,這是一個更加優雅的方式來挖掘變​​量。 – ATMathew