2011-07-19 41 views
4

R 2.13.1我試圖導入一個數據文件,其中包含千位分隔符和逗號作爲小數點的點,以及尾數減去負值。在Mac OS X上導入包含逗號,千分隔符和尾部減號的CSV數據

基本上,我嘗試轉換從:

"A|324,80|1.324,80|35,80-" 

V1 V2  V3 V4 
1 A 324.80 1324.8 -35.80 

現在,交互以下工作都:

gsub("\\.","","1.324,80") 
[1] "1324,80" 

gsub("(.+)-$","-\\1", "35,80-") 
[1] "-35,80" 

,並結合他們:

gsub("\\.", "", gsub("(.+)-$","-\\1","1.324,80-")) 
[1] "-1324,80" 

不過,我無法從read.data除去千個分隔符:

setClass("num.with.commas") 

setAs("character", "num.with.commas", function(from) as.numeric(gsub("\\.", "", sub("(.+)-$","-\\1",from)))) 
mydata <- "A|324,80|1.324,80|35,80-" 

mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas")) 

Warning messages: 
1: In asMethod(object) : NAs introduced by coercion 
2: In asMethod(object) : NAs introduced by coercion 
3: In asMethod(object) : NAs introduced by coercion 

mytable 
    V1 V2 V3 V4 
1 A NA NA NA 

需要注意的是,如果我改變從「\\」。到「」在功能上,事情看起來有點不同:

setAs("character", "num.with.commas", function(from) as.numeric(gsub(",", "", sub("(.+)-$","-\\1",from)))) 

mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas")) 

mytable 
    V1 V2  V3 V4 
1 A 32480 1.3248 -3580 

我認爲這個問題是與DEC =該read.data「」將輸入的「」地「」在調用之前(來自「num.with.commas」),以便輸入字符串可以是例如「1.324.80」。

我希望(「1.123,80 - 」,「num.with.commas」)返回-1123.80並返回1100123.80(「1.100.123,80」,「num.with.commas」)。

如何讓我的num.with.commas替代輸入字符串中最後一個小數點以外的所有

更新:首先,我添加了否定先行並在控制檯的工作得到了爲():

setAs("character", "num.with.commas", function(from) as.numeric(gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE))) 
as("1.210.123.80-","num.with.commas") 
[1] -1210124 
as("10.123.80-","num.with.commas") 
[1] -10123.8 
as("10.123.80","num.with.commas") 
[1] 10123.8 

然而,仍然函數read.table有同樣的問題。添加一些print()到我的函數顯示num.with.commas實際上得到了逗號而不是點。

所以我現在的解決方案是從「,」替換爲「。」。在num.with.commas。

setAs("character", "num.with.commas", function(from) as.numeric(gsub(",","\\.",gsub("(?!\\.\\d\\d$)\\.", "", gsub("(.+)-$","-\\1",from), perl=TRUE)))) 
mytable <- read.table(textConnection(mydata), header=FALSE, quote="", comment.char="", sep="|", dec=",", skip=0, fill=FALSE,strip.white=TRUE, colClasses=c("character","num.with.commas", "num.with.commas", "num.with.commas")) 
mytable 
    V1 V2  V3 V4 
1 A 324.8 1101325 -35.8 

回答

4

你應該先刪除所有的時間段,然後用as.numeric強迫改變之前的逗號小數點()。您稍後可以控制如何使用選項打印小數點(OutDec =「,」)。我不認爲R在內部使用逗號作爲小數點分隔符,即使在常規語言環境中也是如此。

> tst <- c("A","324,80","1.324,80","35,80-") 
> 
> as.numeric(sub("\\,", ".", sub("(.+)-$","-\\1", gsub("\\.", "", tst)))) 
[1]  NA 324.8 1324.8 -35.8 
Warning message: 
NAs introduced by coercion 
+0

是的,這工作得更好的解決方案 - 謝謝! –

1

這裏的正則表達式和替換

mydata <- "A|324,80|1.324,80|35,80-" 
# Split data 
mydata2 <- strsplit(mydata,"|",fixed=TRUE)[[1]] 
# Remove commas 
mydata3 <- gsub(",","",mydata2,fixed=TRUE) 
# Move negatives to front of string 
mydata4 <- gsub("^(.+)-$","-\\1",mydata3) 
# Convert to numeric 
mydata.cleaned <- c(mydata4[1],as.numeric(mydata4[2:4])) 
+0

謝謝,gsk3。這與我首先閱讀的迪文基本上是一樣的答案。 –

+0

不用擔心。很高興你有一些解決你的問題的答案。 –

相關問題