2017-03-24 47 views
0

我有這樣的數據表的每一行卸妝一定caracters:如何從一列

Year GDP 
1998–99 <U+20B9>1,668,739 
1999–00 <U+20B9>1,858,205 
2000–01 <U+20B9>2,000,743 
2001–02 <U+20B9>2,175,260 
2002–03 <U+20B9>2,343,864 
2003–04 <U+20B9>2,625,819 
2004–05 <U+20B9>2,971,464 
2005–06 <U+20B9>3,390,503 
2006–07 <U+20B9>3,953,276 
2007–08 <U+20B9>4,582,086 
2008–09 <U+20B9>5,303,567 
2009–10 <U+20B9>6,108,903 
2010–11 <U+20B9>7,248,860 
2011–12 <U+20B9>8,391,691 
2012–13 <U+20B9>9,388,876 

我想要做的是從所有的行刪除「」。我該怎麼做?

我與greplgrep努力,但對我沒有工作:

df[!grepl("<U+20B9>", df$GDP),] 

df[ grep("REVERSE", df$Name, invert = TRUE) , ] 

這些不工作對我來說...

我想是這樣的:

Year GDP 
1998–99 1,668,739 
1999–00 1,858,205 
2000–01 2,000,743 
2001–02 2,175,260 
2002–03 2,343,864 
2003–04 2,625,819 
2004–05 2,971,464 
2005–06 3,390,503 
2006–07 3,953,276 
2007–08 4,582,086 
2008–09 5,303,567 
2009–10 6,108,903 
2010–11 7,248,860 
2011–12 8,391,691 
2012–13 9,388,876 

我也嘗試使用下面的解決方案,但沒有爲我工作... How to identify/delete non-UTF-8 characters in R

x <- "<U+20B9>" 
Encoding(x) <- "UTF-8" 
iconv(x, "UTF-8", "UTF-8",sub='') 

returns me "<U+20B9>" as it is... 
+0

的可能的複製[如何確定/刪除R中的非UTF-8字符](http://stackoverflow.com/questions/17291287/how-to-identify-delete-non-utf-8-characters-in-r) – r2evans

+2

我找到了一個解決方案'df $ GDP < - substring(df $ GDP,2)' –

回答

1

一些示例性數據的data.table嘗試

data <- setDT(data.frame(
Year=c('1998–99', 
    '1999–00', 
    '2000–01', 
    '2001–02', 
    '2002–03', 
    '2003–04', 
    '2004–05', 
    '2005–06', 
    '2006–07', 
    '2007–08'), 
GDP=c('<U+20B9>1,668,739', 
    '<U+20B9>1,858,205', 
    '<U+20B9>2,000,743', 
    '<U+20B9>2,175,260', 
    '<U+20B9>2,343,864', 
    '<U+20B9>2,625,819', 
    '<U+20B9>2,971,464', 
    '<U+20B9>3,390,503', 
    '<U+20B9>3,953,276', 
    '<U+20B9>4,582,086'))) 

data[,GDP:=sub("^\\s*<U\\+\\w+>\\s*",'',data$GDP)] 

此常規epxression圖案可以被看作是:

  1. U \ \ +部分意味着等的序列U +

  2. \ \ w +簡單說明字母或數字,不僅僅是1

  3. 這部分地包裹在<>然後\ \ S *,它只是刪除任何空格

+1

我還發現了一個解決方案'df $ GDP < - substring(df $ GDP,2)' –

+0

但我喜歡你的工作和解釋。謝謝jg_r –

+0

太棒了!似乎有竅門。 :) –

0

最小解釋以上是:

df$GDP <- substring(df$GDP, 2)