2016-12-25 72 views
4

我需要分割一個長字符串。除了它們是日期和時間之外,拆分的地方應該沒有什麼共同之處。因此,我需要根據特定模式的出現來拆分字符串,即dd/mm/yyyy, hh:mm。雖然我知道函數strsplit和聯合字符串操縱器,但它們似乎沒有幫助。數據樣本如下。如何根據拆分單元的一般格式拆分字符串?

25/06/15, 21:37 - kjadshjabsdjab 
25/06/15, 21:39 - bsadhi2342/342jbjsd 
25/06/15, 21:40 -hkgsad/213/1sadjaa 
25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj 
25/06/15, 21:42 - jkadbsh2:/\sdsadjv 
25/06/15, 21:42 - 

回答

3

我們可以使用正則表達式lookarounds分裂

strsplit(str1, "(?<=[0-9]{2}:[0-9]{2})", perl = TRUE) 

如果我們需要包括 '日期',以及

strsplit(str1, "(?<=[0-9]{2}/[0-9]{2}/[0-9]{2}, [0-9]{2}:[0-9]{2})", perl = TRUE) 

如果我們不想日期時間,然後

setdiff(strsplit(str1, "[0-9]{2}/[0-9]{2}/[0-9]{2}, [0-9]{2}:[0-9]{2}\\s*-\\s*")[[1]], "") 
#[1] "kjadshjabsdjab"    "bsadhi2342/342jbjsd" 
#[3] "hkgsad/213/1sadjaa"   "hsdjhakhjbk12/21s/sda:sdfjbj" 
#[5] "jkadbsh2:/\\sdsadjv" 
1

可以在「 - 」處分割,然後排除最後15個字符。該功能sapply可用於SUBSTR功能應用到列表中的每個項目:

> ss = "25/06/15, 21:37 - kjadshjabsdjab25/06/15, 21:39 - bsadhi2342/342jbjsd25/06/15, 21:40 - hkgsad/213/1sadjaa25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj25/06/15, 21:42 - jkadbsh2:sdsadjv25/06/15, 21:42 -" 
> 
> sapply(strsplit(ss, " - "), function(x) substr(x, 1, nchar(x)-15)) 
    [,1]       
[1,] ""        
[2,] "kjadshjabsdjab"    
[3,] "bsadhi2342/342jbjsd"   
[4,] "hkgsad/213/1sadjaa"   
[5,] "hsdjhakhjbk12/21s/sda:sdfjbj" 
[6,] "jkadbsh2:sdsadjv25"   
2

可以修改正則表達式或突變+副的-的路程,如果不需要的話:

library(stringi) 
library(purrr) 

lines <- readLines(textConnection('25/06/15, 21:37 - kjadshjabsdjab\n25/06/15, 21:39 - bsadhi2342/342jbjsd\n25/06/15, 21:40 -hkgsad/213/1sadjaa\n25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj\n25/06/15, 21:42 - jkadbsh2:/\\sdsadjv\n25/06/15, 21:42 -')) 

stri_match_all_regex(lines, "([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{2}, [[:digit:]]{2}:[[:digit:]]{2})(.*)") %>% 
    map_df(~setNames(as.list(.[,2:3]), c("ts", "string"))) 
## # A tibble: 6 × 2 
##    ts       string 
##    <chr>       <chr> 
## 1 25/06/15, 21:37    - kjadshjabsdjab 
## 2 25/06/15, 21:39   - bsadhi2342/342jbjsd 
## 3 25/06/15, 21:40    -hkgsad/213/1sadjaa 
## 4 25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj 
## 5 25/06/15, 21:42   - jkadbsh2:/\\sdsadjv 
## 6 25/06/15, 21:42        -