2017-02-20 59 views
3

我希望提取第三組括號之間的string部分,最好使用基準R。下面是一個例子的數據集:提取第三組括號之間的字符串

my.data <- read.table(text = ' 
    my.num        my.string         my.cov 
     1 Abc(~1)Fgf(~-1+e2:cp)Bca(~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp)q(~-1+re:se) 10 
     2 Abc(~1)Fgf(~-1+e1:e2:fi)Bca(~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi)q(~1)  20 
     3 Abc(~1)Fgf(~1)Bca(~-1+g1+g2+g2:cp)q(~-1+re:se)       15 
', header = TRUE, stringsAsFactors = FALSE) 
my.data 

這兩種結果的將是有益的:

desired.result1 <- read.table(text = ' 
    my.num      my.string   my.cov 
     1 Bca(~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp)  10 
     2 Bca(~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi)  20 
     3 Bca(~-1+g1+g2+g2:cp)      15 
', header = TRUE, stringsAsFactors = FALSE) 
desired.result1 

desired.result2 <- read.table(text = ' 
    my.num      my.string  my.cov 
     1 ~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp  10 
     2 ~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi  20 
     3 ~-1+g1+g2+g2:cp      15 
', header = TRUE, stringsAsFactors = FALSE) 
desired.result2 

我很生疏的regex我甚至不知道從哪裏開始,並不能找到一個類似的在互聯網上的問題。感謝您的任何建議或協助。

+1

分裂 「(」,拿到第三個,收拾 – zx8754

+0

能有嵌套括號? –

+0

[這可能會有所幫助](http://stackoverflow.com/questions/8613237/extract-info-inside-all-paren thesis-in-r) – Sotos

回答

3

第一個表達式:

sub(".*?\\(.*?\\).*?\\(.*?\\)(.*?\\(.*?\\)).*", "\\1", my.data$my.string) 
[1] "Bca(~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp)" "Bca(~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi)" 
[3] "Bca(~-1+g1+g2+g2:cp)" 

第二個表達式:

sub(".*?\\(.*?\\).*?\\(.*?\\).*?\\((.*?)\\).*", "\\1", my.data$my.string) 
[1] "~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp" "~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi" "~-1+g1+g2+g2:cp" 
4

使用strsplit:上

sapply(strsplit(my.data$my.string, split = "(", fixed = TRUE), function(i){ 
    strsplit(i[4], split = ")", fixed = TRUE)[[1]][1]}) 

# [1] "~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp" "~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi" "~-1+g1+g2+g2:cp" 
+2

或這個'paste0(sapply(strsplit(my.data $ my.string,'\\)'','[',3),')')'得到第一個 – Sotos

相關問題