Gsub，提取一定數量的數字

很抱歉，如果在這裏某處有我的問題的答案。不幸的是我找不到它。Gsub，提取一定數量的數字

我有一個字符串，其形式爲「ANNNNNNN.tif」，其中A只是一個字母，N是數字。有一個7位數字。

new <- c("A2000001.tif" ,"A2000002.tif", "A2000003.tif", "A2000004.tif", "A2000005.tif", "A2000006.tif")

我想獲得年份和月份值。前4位數字代表年份和最後2個月。例如。我寫了這個得到一年的價值

year1 <- gsub("([0-9]){3,4}?.*$", "", new) 
year <- as.numeric(gsub("A", "", year1))

但我想它可以寫得更短，我仍然很難得到一個月的價值。

UPD：我寫了這個來得到一個月。

month1 <- gsub("^*.([0-9]){6,7}?", "\\1", new) 
month <- as.numeric(gsub(".tif", "", month1))

但仍然爲了學習的目的，我想知道如何以更好的方式做到這一點。

來源

2017-04-16 Valerija

正試圖創建字符串或提取數字？請澄清。 –

我有一個名稱的文件：「A2000001.tif」「A2000002.tif」「A2000003.tif」「A2000004.tif」「A2000005.tif」「A2000006.tif」等我想擺脫它的一年和一個月的圖像。所以前4位數字代表年份和最後2個月。 – Valerija

'read.fwf（textConnection（new），widths = c（1，4，3），col.names = c（'letter'，'year'，'month'））'呃，不知道是什麼額外的0是，如果需要避免，你可以把它放在一個單獨的列'read.fwf（textConnection（new），widths = c（1，4，1，2），col.names = c（'l '，'y'，'x'，'m'））' – rawr

這裏有一些基本的選擇：

new <- c("A2000001.tif" ,"A2000002.tif", "A2000003.tif", 
     "A2000004.tif", "A2000005.tif", "A2000006.tif")

假設他們總是在字符串中的相同位置：

as.integer(substr(new, 2, 5)) 
# [1] 2000 2000 2000 2000 2000 2000 
as.integer(substr(new, 7, 8)) 
# [1] 1 2 3 4 5 6

稍微更具適應性，假設它們總是遵循非數字（年）或在點（月）之前：

as.integer(sub("^[^0-9]([0-9]{4}).*", "\\1", new)) 
# [1] 2000 2000 2000 2000 2000 2000 
as.integer(sub(".*([0-9]{2})\\..*", "\\1", new)) 
# [1] 1 2 3 4 5 6

提取所有號碼，並做一些花式數學他們：

x <- as.integer(gsub("[^0-9]", "", new)) 
x %/% 1000 
# [1] 2000 2000 2000 2000 2000 2000 
x %% 100 
# [1] 1 2 3 4 5 6

尤伯杯強大的正則表達式提取（https://xkcd.com/1171/）：

lapply(
    regmatches(new, 
      gregexpr("(?<![0-9])[0-9]{4}|[0-9]{2}(?![0-9])", new, perl = TRUE)), 
    as.integer 
) 
# [[1]] 
# [1] 2000 1 
# [[2]] 
# [1] 2000 2 
# [[3]] 
# [1] 2000 3 
# [[4]] 
# [1] 2000 4 
# [[5]] 
# [1] 2000 5 
# [[6]] 
# [1] 2000 6

（雖然這最後一個是列表向量，爲您的消費略有不同的格式。）

來源

2017-04-16 21:11:18 r2evans

謝謝你，r2evans！這真的是詳細的迴應！ – Valerija

tidyr具有非常強大separate，關於數據幀/數據表的效果很好，

new <- c("A2000001.tif" ,"A2000002.tif", "A2000003.tif", "A2000004.tif", "A2000005.tif", "A2000006.tif") 

library(tidyr) 

df <- as.data.frame(new) %>% 
    separate(new, into = c("letter", "year", "extra", "month", "extension"), sep=c(1,5,6,8), remove = FALSE) %>% 
    select(-extra, -extension) 

df   

#   new letter year month 
# 1 A2000001.tif  A 2000 01 
# 2 A2000002.tif  A 2000 02 
# 3 A2000003.tif  A 2000 03 
# 4 A2000004.tif  A 2000 04 
# 5 A2000005.tif  A 2000 05 
# 6 A2000006.tif  A 2000 06

以下是在基R 2與gsub一個典型的方法。在每種情況下，儘可能多地匹配字符串的主要部分，匹配捕獲括號中的有趣部分，匹配其餘部分。與「\\ 1」替換指示捕獲值

new <- c("A2000001.tif" ,"A2000002.tif", "A2000003.tif", "A2000004.tif", "A2000005.tif", "A2000006.tif") 
letter <- gsub("(.).*", "\\1", new) 
year <- as.numeric(gsub(".(\\d{4}).*", "\\1", new)) 
month <- as.numeric(gsub(".\\d{4}.(\\d{2}).+", "\\1", new))

來源

2017-04-16 20:20:40 epi99

謝謝，epi99！看起來很方便！ – Valerija

Gsub，提取一定數量的數字

回答

相關問題