[R串清潔

我與一些字符串，這是非常混亂的工作如下圖所示[R串清潔

Value 
------------------- 
25 
32.12 . (05- 
33.90 , 
46.70 , 
() 26.60 
27.2 
23.24 . (12- 
36.52 , 
27.1814404432133 [ 
29.73 . (22- 
31.8058003525076 [ 
35.40 , 
38.44 . 
46.14 , 
29.26 [ 
25.44 .

我不知道如何清潔高效，使得它看上去是這樣。

Value 
------------------- 
25 
32.12 
33.90 
46.70 
26.60 
27.2 
23.24 
36.52 
27.1814404432133 
29.73 
31.8058003525076 
35.40 
38.44 
46.14 
29.26 
25.44

我試着用子功能，sub(" .*", '', Value)捕捉空間，但沒有工作之前一切，所以尋找如何清理這個字符串一些建議或提示。

Value <- c(" 25 \n", " 32.12 . (05-", "33.90 ,\n", "46.70 ,\n", "() 26.60 ", 
      " 27.2 ", " 23.24 . (12-", "36.52 ,\n", " 27.1814404432133\n\n[", 
      " 29.73 . (22-", " 31.8058003525076\n\n[", "35.40 ,\n", " 38.44 .\n", 
      "46.14 ,\n", " 29.26\n\n[", " 25.44 .\n") 
df <- data.frame(Value)

來源

2017-08-16 Jill Sellum

您可以提取使用

Value <- c(" 25 \n", " 32.12 . (05-", "33.90 ,\n", "46.70 ,\n", "() 26.60 ", 
      " 27.2 ", " 23.24 . (12-", "36.52 ,\n", " 27.1814404432133\n\n[", 
      " 29.73 . (22-", " 31.8058003525076\n\n[", "35.40 ,\n", " 38.44 .\n", 
      "46.14 ,\n", " 29.26\n\n[", " 25.44 .\n") 
df <- data.frame(Value) 
df$Value <- sub(".*?(\\d[0-9.]*).*", "\\1", df$Value)

第一號見R demo online

詳細

.*? - 任何0+字符，儘可能少
(\\d[0-9.]*) - 第1組捕獲的任何數字（\\d），然後0+數字或符號.
.* - 任何0+字符到字符串的末尾。

的sub功能執行與\1反向引用持有價值單個替換捕獲到組1

如果你想確保你只能提取數字（S）+（. +數字（S））*模式，您可以使用

df$Value <- sub(".*?(\\d+(?:\\.\\d+)?).*", "\\1", df$Value)

見this R demo

來源

2017-08-16 05:06:51

啊，我試圖子（「*？（\\ [0-9] *）。*」，「\\ 1」，DF $ Value）選項，但是現在我錯過了一些參數，我知道缺少了什麼。謝謝。 –

你可以試試這個：

library("stringr") 

str_extract(df$Value, "(\\d|\\.)+")

來源

2017-08-16 05:09:35

感謝喬希這也解決了這個問題。 –

我們可以使用regmatches/regexpr從base R

as.numeric(regmatches(df$Value, regexpr("[0-9][0-9.]*", df$Value)))

來源

2017-08-16 06:13:12 akrun

回答

相關問題