好的 - 也許這是一個更好的例子。我正在尋找關於如何在正則表達式中引用變量的指導/參考 - 而不是如何爲此數據構建正則表達式。R在正則表達式中使用變量
如何使用變量中的值來表示下一個變量?速度
library(plyr)
library(tm)
library(stringr)
library(gsubfn)
數據集
d1$sub <- c("LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 50-55% (0-49)LESS THAN 50% COMMON:", "LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 60-70% (0-49)LESS THAN 50% COMMON:", "LEFT CAROTID STENOSIS: (40-60)APPROXIMATELY 40% INCOMPLETE SCAN SEE NOTES (40-50)LESS THAN 50% COMMON:")
d1$sub
[1] "LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 50-55% (0-49)LESS THAN 50% COMMON:"
[2] "LEFT CAROTID STENOSIS: (50-69)APPROXIMATELY 60-70% (0-49)LESS THAN 50% COMMON:"
[3] "LEFT CAROTID STENOSIS: (40-60)APPROXIMATELY 40% INCOMPLETE SCAN SEE NOTES (40- 50)LESS THAN 50% COMMON:"
提取SUB1
d1$sub1 <- as.character(lapply((strapply(d1$sub,"((?<=LEFT CAROTID STENOSIS:).{5,}?(?=(\\(|COMMON)))", perl=TRUE)), unique))
d1$sub1
[1] " (50-69)APPROXIMATELY 50-55% "
[2] " (50-69)APPROXIMATELY 60-70% "
[3] " (40-60)APPROXIMATELY 40% INCOMPLETE SCAN SEE NOTES "
現在參考SUB1從數據
想回到獲得SUB2「(0-49)LESS THAN 50%「,」(0-49)小於50%「和」(40-50)小於50%「
d1$sub2 <- as.character(lapply((strapply(d1$sub,"((?<=\\d1$sub1).*?(?=COMMON))", perl=TRUE)), unique))
d1$sub2
[1] "NULL" "NULL" "NULL"
*原始帖子下面**
我提取文本報告醫療信息,並在嘗試使用一個變量(SUB1 $)作爲正則表達式的一部分,去尋找下一個變量($ SUB2 )。
如何使用變量中的值來表示下一個變量?
library(plyr)
library(tm)
library(stringr)
library(gsubfn)
#Dataset of velocities
d1 <- c("CCA: 135 cm/sec ICA: 50 cm/sec", "CCA: 150 cm/sec ICA: 75 cm/sec")
d1
[1] "CCA: 135 cm/sec ICA: 50 cm/sec" "CCA: 150 cm/sec ICA: 75 cm/sec"
#Lookahead to get sub1
d1$sub1 <- as.character(lapply((strapply(d1,"(.*?(?=ICA:))", perl=TRUE)), unique))
Warning message:
In d1$sub1 <- as.character(lapply((strapply(d1, "(.*?(?=ICA:))", :
Coercing LHS to a list
d1
[[1]]
[1] "CCA: 135 cm/sec ICA: 50 cm/sec"
[[2]]
[1] "CCA: 150 cm/sec ICA: 75 cm/sec"
$sub1
[1] "CCA: 135 cm/sec " "CCA: 150 cm/sec "
#Now reference sub1 to get sub2 - does not work?
#Want to return "ICA:50 cm/sec" and "ICA:75 cm/sec"
#Used paste(d1$sub1) to try getting the $sub1 variable into the regex, but doesn't work)
d1$sub2 <- as.character(lapply((strapply(d1,"((?<=paste(d1$sub1)).*?)", perl=TRUE)), unique))
d1$sub2
[1] "NULL" "NULL" "NULL"
文本具有的結構,但是在長度上,內容等定義第一變量($ SUB1)是容易的方面是非常可變的,但用它來限定第二變量將是最精確的。
也許我應該強調,文本是非常可變的 - 所以基於文本模式的簡單正則表達式將不起作用。我需要使用第一個變量來定位文本中的第二個變量。這是醫療信息,所以我不能發佈實際數據。
不知道我完全理解你的問題 - 期望的最終輸出爲2個變量($ SUB1 = CA:135釐米/秒,$ sub2 = CCA:50釐米/秒)。我可以生成這些變量,但是我正在爲如何引用第一個來獲取第二個定位器而掙扎。 – user2547308