2015-03-13 465 views
-1

我對R完全陌生 - 真的不知道我在做什麼。但是我真的需要根據某人的建議對這些數據運行雙變量/多變量回歸,並且我被卡住了。任何幫助是極大的讚賞。cor(data [,-1],use =「complete.obs」)中的錯誤:'x'必須是數字

rm(list=ls()) 
setwd("C:/Users/Bogi/Documents/School/Honors Thesis/Voting and Economic Data") 
data<-read.csv("BOGDAN_DATA1.csv") 
head(data) 

round(cor(data[,-1],use="complete.obs"),1) 
Error in cor(data[, -1], use = "complete.obs") : 'x' must be numeric 

dput

structure(list(REGION = structure(1:6, .Label = c("Altai Republic", 
"Altai Territory", "Amur Region", "Arkhangelsk Region", "Astrakhan region", 
"Belgorod region"), class = "factor"), PCT_CHANGE_VOTE = structure(c(2L, 
3L, 5L, 4L, 6L, 1L), .Label = c("-13%", "-16%", "-17%", "-25%", 
"-26%", "2%"), class = "factor"), PCT_CHANGE_GRP = structure(c(2L, 
1L, 4L, 3L, 3L, 4L), .Label = c("10%", "17%", "19%", "27%"), class = "factor"), 
PCT_CHANGE_INFLATION = structure(c(1L, 2L, 1L, 3L, 3L, 2L 
), .Label = c("-2%", "-3%", "-4%"), class = "factor"), PCT_CHANGE_UNEMP = structure(c(5L, 
4L, 1L, 2L, 6L, 3L), .Label = c("-13%", "-14%", "-17%", "-3%", 
"5%", "7%"), class = "factor"), POVERTY = c(18.6, 22.6, 20.4, 
14.4, 14.2, 8.6), POP_AGE1 = c(25.8, 16.9, 18.5, 17.1, 17.8, 
15.2), POP_AGE2 = c(58.8, 59.6, 61.3, 60.4, 60.8, 60.3), 
POP_AGE3 = c(15.4, 23.5, 20.2, 22.5, 21.4, 24.5), POP_URBAN = c(28.7, 
55.2, 67, 76.2, 66.7, 66.4), POP_RURAL = c(71.3, 44.8, 33, 
23.8, 33.3, 33.6), COMPUTER = c(46.4, 54.5, 66.1, 74, 65.1, 
55.2), INTERNET = c(32.1, 41, 50.7, 66.5, 60, 50.7)), .Names = c("REGION", 
"PCT_CHANGE_VOTE", "PCT_CHANGE_GRP", "PCT_CHANGE_INFLATION", 
"PCT_CHANGE_UNEMP", "POVERTY", "POP_AGE1", "POP_AGE2", "POP_AGE3", 
"POP_URBAN", "POP_RURAL", "COMPUTER", "INTERNET"), row.names = c(NA, 
6L), class = "data.frame") 
+1

檢查'str(data)'以查找列的類。如果'data [-1]'列是非數字的,則可能會出現此錯誤。請使用'dput'顯示前幾行輸出(小滴(頭(數據)))' – akrun 2015-03-13 06:45:36

+1

第2至5列是因子。 – 2015-03-13 07:03:01

+0

對不起,這是什麼意思?我需要解決什麼問題? – 2015-03-13 07:18:08

回答

1

你可以循環列2:5(lapply(data[2:5], ..)),除去%列2:5(gsub('[%]',..))和列轉換爲numeric。從gsub輸出將是character類,它通過as.numeric

data[2:5] <- lapply(data[2:5], function(x) 
        as.numeric(gsub('[%]', '', x))) 

Cor1 <- round(cor(data[-1],use="complete.obs"),1) 

轉換爲numeric或者你可以刪除在使用上殼awk這些列的%(假設, 作爲分隔符)

awk 'BEGIN {OFS=FS=","} function SUB(F) {sub(/\%/,"", $F)}{SUB(2);SUB(3);SUB(4);SUB(5)}1' Bogdan.csv > Bogdan2.csv 

閱讀帶有read.csv的文件並運行cor

dat1 <- read.csv('Bogdan2.csv') 
Cor2 <- round(cor(dat1[-1], use='complete.obs'), 1) 
identical(Cor1, Cor2) 
#[1] TRUE 
相關問題