2015-12-01 113 views
8

內我有一系列的我試圖規範列名。添加前導零字符串

names <- c("apple", "banana", "orange", "apple1", "apple2", "apple10", "apple11", "banana2", "banana12") 

我想任何有一位數由零來填充,所以

apple 
banana 
orange 
apple01 
apple02 
apple10 
apple11 
banana02 
... 

我一直在嘗試使用stringr

strdouble <- str_detect(names, "[0-9]{2}") 
strsingle <- str_detect(names, "[0-9]") 

str_detect(names[strsingle & !strdouble]) 

卻無力弄清楚如何選擇更換/添加...

+3

'sub(「([a-z])([0-9])$」,「\\ 10 \\ 2」,names)'help you? – etienne

+0

@etienne是的!你能否解釋替換的「\\ 10 \\ 2」結構? – ano

+0

我加了一個解釋的答案。 – etienne

回答

8

您可以使用sub("([a-z])([0-9])$","\\10\\2",names)

[1] "apple" "banana" "orange" "apple01" "apple02" "apple10" "apple11" "banana02" 
[9] "banana12" 

它只更改那裏是繼信(該$是字符串的結尾)單個數字的名字。

\\1選擇在()第一嵌段:字母。然後它將一個前導0,然後第二個塊在():數字。

6

下面是使用負一個選項先行和後視斷言來識別單個數字。

gsub('(?<!\\d)(\\d)(?!\\d)', '0\\1', names, perl=TRUE) 
# [1] "apple" "banana" "orange" "apple01" "apple02" "apple10" "apple11" "banana02" "banana12" 
1

str_pad從stringr:

library(stringr) 

pad_if = function(x, cond, n, fill = "0") str_pad(x, n*cond, pad = fill) 

s = str_split_fixed(names,"(?=\\d)",2) 
#  [,1]  [,2] 
# [1,] "apple" "" 
# [2,] "banana" "" 
# [3,] "orange" "" 
# [4,] "apple" "1" 
# [5,] "apple" "2" 
# [6,] "apple" "10" 
# [7,] "apple" "11" 
# [8,] "banana" "2" 
# [9,] "banana" "12" 

paste0(s[,1], pad_if(s[,2], cond = nchar(s[,2]) > 0, n = max(nchar(s[,2])))) 
# [1] "apple" "banana" "orange" "apple01" "apple02" "apple10" "apple11" "banana02" "banana12" 

這也延伸到案件就像去從c("a","a2","a20","a202")c("a","a002","a020","a202"),該其他方法不包括。

的stringr包基於stringi,其中有在這裏使用的所有相同的功能,我猜。


的sprintf自基部,具有一個類似的方法:

pad_if2 = function(x, cond, n, fill = "0") 
    replace(x, cond, sprintf(paste0("%",fill,n,"d"), as.numeric(x)[cond])) 

s0 = strsplit(names,"(?<=\\D)(?=\\d)|$",perl=TRUE) 

s1 = sapply(s0,`[`,1) 
s2 = sapply(sapply(s0,`[`,-1), paste0, "") 

paste0(s1, pad_if2(s2, cond = nchar(s2) > 0, n = max(nchar(s2)))) 

pad_if2具有比pad_if少一般用途,因爲它需要x是強制轉換到數字。這裏的每一步幾乎都比上面提到的包的相應代碼更笨拙。

+0

如果downvote發生不明原因,我會繼續刪除這個... – Frank

0

主要是數字之前,$和字母來識別單個數字。它可以嘗試:

gsub('[^0-9]([0-9])$','0\\1',names) 
[1] "apple" "banana" "orange" "appl01" "appl02" "apple10" "apple11" "banan02" "banana12" 

或前瞻。

gsub('(?<=[a-z])(\\d)$','0\\1',names,perl=T) 
+0

和Matthew一樣,但用'$'代替'(?!\\ d)'?嗯,我想更像是馬修和艾蒂安的組合...... – Frank