2015-09-28 50 views
-2

我有一個CSV文件中像這樣:如何閱讀整數用「」千月到CSV文件

Year,All,Northeast,Midwest,South,West,  CPI 
1987,"85,600","133,300","66,000","80,400","113,200",113.6 
1988,"89,300","143,000","68,400","82,200","124,900",118.3 
1989,"89,500","127,700","71,800","84,400","127,100",124 
1990,"92,000","126,400","75,300","85,100","129,600",130.7 
1991,"97,100","129,100","79,500","88,500","135,300",136.2 
1992,"99,700","128,900","83,000","91,500","131,500",140.3 
1993,"103,100","129,100","86,000","94,300","132,500",144.5 

的代碼是這樣的:

> fn <- paste(data.path, p2, "tmp.csv", sep="//") 
> d <- read.csv(fn) 
> str(d) 
'data.frame': 7 obs. of 7 variables: 
$ Year  : int 1987 1988 1989 1990 1991 1992 1993 
$ All  : Factor w/ 7 levels "103,100","85,600",..: 2 3 4 5 6 7 1 
$ Northeast: Factor w/ 6 levels "126,400","127,700",..: 5 6 2 1 4 3 4 
$ Midwest : Factor w/ 7 levels "66,000","68,400",..: 1 2 3 4 5 6 7 
$ South : Factor w/ 7 levels "80,400","82,200",..: 1 2 3 4 5 6 7 
$ West  : Factor w/ 7 levels "113,200","124,900",..: 1 2 3 4 7 5 6 
$ CPI  : num 114 118 124 131 136 ... 
> d 
    Year  All Northeast Midwest South West CPI 
1 1987 85,600 133,300 66,000 80,400 113,200 113.6 
2 1988 89,300 143,000 68,400 82,200 124,900 118.3 
3 1989 89,500 127,700 71,800 84,400 127,100 124.0 
4 1990 92,000 126,400 75,300 85,100 129,600 130.7 
5 1991 97,100 129,100 79,500 88,500 135,300 136.2 
6 1992 99,700 128,900 83,000 91,500 131,500 140.3 
7 1993 103,100 129,100 86,000 94,300 132,500 144.5 

當我使用read.csv函數,它以「All,Northeast,Midwest,South,West」作爲字符串。如何以簡單的方式糾正這個問題?

其他文件: 此CSV文件由Excel生成。我發現,因爲Excel使用逗號作爲CSV文件中的分隔符,所以如果在數字中使用逗號作爲千位分隔符,它將爲數字添加引號。 Excel可以很好地處理這種格式。但它增加了一些comfuse到R.

謝謝。

+5

你可以發佈你用來讀取CSV的確切代碼嗎? – Heroka

+3

我無法重現您的問題。 'read.csv'很好地讀取你的數據。 – Roland

+1

read.csv的默認分隔符是逗號:read.csv(file,header = TRUE,sep =「,」,...) – djhurio

回答

2
DF <- read.csv(text = 'Year,All,Northeast,Midwest,South,West,  CPI 
1987,"85,600","133,300","66,000","80,400","113,200",113.6 
1988,"89,300","143,000","68,400","82,200","124,900",118.3 
1989,"89,500","127,700","71,800","84,400","127,100",124 
1990,"92,000","126,400","75,300","85,100","129,600",130.7 
1991,"97,100","129,100","79,500","88,500","135,300",136.2 
1992,"99,700","128,900","83,000","91,500","131,500",140.3 
1993,"103,100","129,100","86,000","94,300","132,500",144.5') 

#remove "," and convert 
DF[, 2:6] <- lapply(DF[, 2:6], function(x) type.convert(gsub(",", "", x, fixed = TRUE))) 

str(DF) 
# 'data.frame': 7 obs. of 7 variables: 
# $ Year  : int 1987 1988 1989 1990 1991 1992 1993 
# $ All  : int 85600 89300 89500 92000 97100 99700 103100 
# $ Northeast: int 133300 143000 127700 126400 129100 128900 129100 
# $ Midwest : int 66000 68400 71800 75300 79500 83000 86000 
# $ South : int 80400 82200 84400 85100 88500 91500 94300 
# $ West  : int 113200 124900 127100 129600 135300 131500 132500 
# $ CPI  : num 114 118 124 131 136 ...