2013-08-21 48 views
0

好的 - 也許這是一個更好的例子。我正在尋找關於如何在正則表達式中引用變量的指導/參考 - 而不是如何爲此數據構建正則表達式。R在正則表達式中使用變量

如何使用變量中的值來表示下一個變量?速度

library(plyr)  
library(tm) 
library(stringr) 
library(gsubfn) 

數據集

d1$sub <- c("LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 50-55% (0-49)LESS THAN 50%  COMMON:", "LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 60-70% (0-49)LESS THAN 50% COMMON:", "LEFT CAROTID STENOSIS: (40-60)APPROXIMATELY 40% INCOMPLETE SCAN SEE NOTES (40-50)LESS THAN 50% COMMON:") 

d1$sub 
[1] "LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 50-55% (0-49)LESS THAN 50% COMMON:"       
[2] "LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 60-70% (0-49)LESS THAN 50% COMMON:"       
[3] "LEFT CAROTID STENOSIS: (40-60)APPROXIMATELY 40% INCOMPLETE SCAN SEE NOTES (40- 50)LESS THAN 50% COMMON:" 

提取SUB1

d1$sub1 <- as.character(lapply((strapply(d1$sub,"((?<=LEFT CAROTID STENOSIS:).{5,}?(?=(\\(|COMMON)))", perl=TRUE)), unique)) 
d1$sub1 
[1] " (50-69)APPROXIMATELY 50-55% "      
[2] " (50-69)APPROXIMATELY 60-70% "      
[3] " (40-60)APPROXIMATELY 40% INCOMPLETE SCAN SEE NOTES " 

現在參考SUB1從數據

想回到獲得SUB2「(0-49)LESS THAN 50%「,」(0-49)小於50%「和」(40-50)小於50%「

d1$sub2 <- as.character(lapply((strapply(d1$sub,"((?<=\\d1$sub1).*?(?=COMMON))", perl=TRUE)), unique)) 
d1$sub2 
[1] "NULL" "NULL" "NULL" 

*原始帖子下面**

我提取文本報告醫療信息,並在嘗試使用一個變量(SUB1 $)作爲正則表達式的一部分,去尋找下一個變量($ SUB2 )。

如何使用變量中的值來表示下一個變量?

library(plyr) 
library(tm) 
library(stringr) 
library(gsubfn) 

#Dataset of velocities 
d1 <- c("CCA: 135 cm/sec ICA: 50 cm/sec", "CCA: 150 cm/sec ICA: 75 cm/sec") 
d1 
[1] "CCA: 135 cm/sec ICA: 50 cm/sec" "CCA: 150 cm/sec ICA: 75 cm/sec" 

#Lookahead to get sub1 
d1$sub1 <- as.character(lapply((strapply(d1,"(.*?(?=ICA:))", perl=TRUE)), unique)) 
Warning message: 
In d1$sub1 <- as.character(lapply((strapply(d1, "(.*?(?=ICA:))", : 
Coercing LHS to a list 
d1 
[[1]] 
[1] "CCA: 135 cm/sec ICA: 50 cm/sec" 

[[2]] 
[1] "CCA: 150 cm/sec ICA: 75 cm/sec" 

$sub1 
[1] "CCA: 135 cm/sec " "CCA: 150 cm/sec " 

#Now reference sub1 to get sub2 - does not work? 
#Want to return "ICA:50 cm/sec" and "ICA:75 cm/sec" 
#Used paste(d1$sub1) to try getting the $sub1 variable into the regex, but doesn't work) 
d1$sub2 <- as.character(lapply((strapply(d1,"((?<=paste(d1$sub1)).*?)", perl=TRUE)), unique)) 
d1$sub2 
[1] "NULL" "NULL" "NULL" 

文本具有的結構,但是在長度上,內容等定義第一變量($ SUB1)是容易的方面是非常可變的,但用它來限定第二變量將是最精確的。

也許我應該強調,文本是非常可變的 - 所以基於文本模式的簡單正則表達式將不起作用。我需要使用第一個變量來定位文本中的第二個變量。這是醫療信息,所以我不能發佈實際數據。

+0

不知道我完全理解你的問題 - 期望的最終輸出爲2個變量($ SUB1 = CA:135釐米/秒,$ sub2 = CCA:50釐米/秒)。我可以生成這些變量,但是我正在爲如何引用第一個來獲取第二個定位器而掙扎。 – user2547308

回答

0

你會need to escape various characters在正則表達式中使用變量,但爲什麼不做更簡單的事情?

sub('(.*)ICA.*', '\\1', d1) 
#[1] "CCA: 135 cm/sec " "CCA: 150 cm/sec " 
sub('.*(ICA.*)', '\\1', d1) 
#[1] "ICA: 50 cm/sec" "ICA: 75 cm/sec" 
2

試試這個:

> d1 <- c("CCA: 135 cm/sec ICA: 50 cm/sec", "CCA: 150 cm/sec ICA: 75 cm/sec") 
> t(strapplyc(d1, "\\w+: \\S+ \\S+", simplify = TRUE)) 
    [,1]    [,2]    
[1,] "CCA: 135 cm/sec" "ICA: 50 cm/sec" 
[2,] "CCA: 150 cm/sec" "ICA: 75 cm/sec" 
0

嘗試使用paste0()功能。這將把你想要使用的所有變量和任何正則表達式放在一起。

grep(paste0("^.*", variable, ".*$"), d1) 

你也可以爭論collapse = ""添加到paste0()如果您的變量可能有> 1元

相關問題