2014-03-19 38 views
0

我想讓數據框中的任何0值都有一個正數,這樣我的模型將起作用。在一列中替換'0'(單個零),不用替換大數中的零(例如10,20,30等)

然而,當我嘗試更換所有零個值,我也更換是屬於更大的數字,如10,20,30,40 ... 100,1000等字符串的零..

如何指定我只想要替換那些實際爲零的值,而不是僅包含數字零的任何字符串?

謝謝!

下面的代碼:

total<- read.csv("total.csv")  
total.rm <- na.omit(total) 

#removing NAs/NAN 
total.rm$mediansp[which(is.nan(total.rm$mediansp))] = NA 
total.rm$mediansp[which(total.rm$mediansp==Inf)] = NA 
total.rm$connections[which(is.nan(total.rm$connections))] = NA 
total.rm$connections[which(total.rm$connections==Inf)] = NA 

#make all 0 values positive 
total.rm$mediansp <- gsub("0", "0.00001", total.rm$mediansp) 
total.rm$connections <- gsub("0", "0.00001", total.rm$connections) 

#remove zeros varaibles 
total.rm$mediansp <- gsub("NA", "0", total.rm$mediansp) 
total.rm$connections <- gsub("NA", "0", total.rm$connections) 
total.rm$mediansp <- gsub("0", "0.01", total.rm$mediansp) 
total.rm$connections <- gsub("0", "0.01", total.rm$connections) 

#convert character variables to numeric variables 
total.rm$mediansp <- as.numeric(total.rm$mediansp) 
total.rm$connections <- as.numeric(total.rm$connections) 

#plot maps with fitted values and with residuals 
sc.lm <- lm (log(mediansp) ~ log(connections), total.rm, na.action="na.exclude") 
total.rm$fitted.s <- predict(sc.lm, total.rm) - mean(predict(sc.lm, total.rm)) 
total.rm$residuals <- residuals(sc.lm) 

這裏的結構:

data.frame': 133537 obs. of 19 variables: 
$ pcd   : Factor w/ 1736958 levels "AB101AA","AB101AB",..: 
$ pcdstatus  : Factor w/ 5 levels "Insufficient Data",..: 5 5 5 5 5 5 5 5 5 5 ... 
$ mbps2   : num 0 0 0 0 1 0 1 1 0 0 ... 
$ averagesp  : chr "16" "19.3" "14.1" "14.9" ... 
$ mediansp  : chr "16.2" "20" "18.7" "16.8" ... 
$ maxsp   : chr "23.8" "24" "20.2" "19.7" ... 
$ nga   : num 0 0 0 1 0 1 1 1 1 1 ... 
$ connections : chr "54" "14" "98" "43" ... 
$ oslaua  : Factor w/ 407 levels "","95A","95B",..: 326 326 326 326 326 326 326 
$ x    : int 540194 540194 540300 539958 540311 539894 540311 540379 540310 
$ y    : int 169201 169201 169607 169584 168997 169713 168997 168749 168879 
$ ctry   : Factor w/ 4 levels "E92000001","N92000002",..: 1 1 1 1 1 1 1 1 1 1 
$ hro2   : Factor w/ 13 levels "","E12000001",..: 8 8 8 8 8 8 8 8 8 8 ... 
$ soa2   : Factor w/ 7197 levels "","E02000001",..: 145 145 135 135 145 135 145 
$ urindew  : int 5 5 5 5 5 5 5 5 5 5 ... 
$ averagesp.lt : num 2.77 2.96 2.65 2.7 2.05 ... 
$ mediansp.lt : num 2.79 3 2.93 2.82 2.09 ... 
$ maxsp.lt  : num 3.17 3.18 3.01 2.98 2.68 ... 
$ connections.lt: num 3.99 2.64 4.58 3.76 3.22 ... 
+1

爲什麼你將數字視爲字符串?事實上,爲什麼,例如,'mediansp'是一個字符開始而不是數字?而且,只有改變零值看起來是一個非常糟糕的主意。如果目標是通常記錄日誌,則會爲所有值添加一個餘數。 – Roland

回答

1

gsub在下面的代碼做一個正則表達式替換。若要替換字符串"0",請在gsub pattern = "^0$"中設置模式參數。這應該可以解決你的問題。

作爲一個補充說明,它幾乎可以肯定是很糟糕的形式,只需用非常小的數字替換0即可使您的模型正常工作。選擇一個更好的模型。

+0

感謝gwatson,這工作。另外,感謝您處理零值和模型選擇的額外評論。 –