2017-02-09 26 views
0

什麼是正確的正則表達式從括號中提取字符串「(程序)」,在一般的文本 - 或 - 下面正則表達式:如何從過去的括號中提取文本

輸入字符串實例從字符串

正電子發射斷層攝影使用flutemetamol(18F)與計算的斷層攝影術 腦的(程序)

另一示例

尿路感染預防(程序)

可能的方法是:

  • 轉到結束文本的,並尋找第一開口括號和從該位置到取子集

  • 從文本的開始,文字的末尾確定最後的「(」字符並做那個位置上結束,因爲子

其它字符串可以被(不同的「標籤」被提取)

[1] "Xanthoma of eyelid (disorder)"     "Ventricular tachyarrhythmia (disorder)"   
[3] "Abnormal urine odor (finding)"     "Coloboma of iris (disorder)"      
[5] "Macroencephaly (disorder)"      "Right main coronary artery thrombosis (disorder)" 

(一般正則表達式尋求)(或R中的溶液中,甚至更好)

回答

2

子可以做到這一點正確的正則表達式

Text = c("Positron emission tomography using flutemetamol (18F) 
    with computed tomography of brain (procedure)", 
    "Urinary tract infection prophylaxis (procedure)", 
    "Xanthoma of eyelid (disorder)",      
    "Ventricular tachyarrhythmia (disorder)",   
    "Abnormal urine odor (finding)",      
    "Coloboma of iris (disorder)",     
    "Macroencephaly (disorder)",       
    "Right main coronary artery thrombosis (disorder)") 
sub(".*\\((.*)\\).*", "\\1", Text) 
[1] "procedure" "procedure" "disorder" "disorder" "finding" "disorder" 
[7] "disorder" "disorder" 

增編:正則表達式
的問題要求尋找在該最後括號中的內容的詳細解釋字符串。這個表達式有點令人困惑,因爲它包含括號的兩種不同用法,一種是在正在處理的字符串中表示括號,另一種是設置一個「捕獲組」,我們指定應該由表達式返回的部分的方式。表達是由五個基本單元:

1. Initial .* - matches everything up to the final open parenthesis. 
    Note that this is relying on "greedy matching" 
2. \\( ... \\) - matches the final set of parentheses. 
    Because (by itself means something else, we need to "escape" the 
    parentheses by preceding them with \. That is we want the regular 
    expression to say \( ... \). However, the way R interprets strings, 
    if we just typed \(and \), R would interpret the \ as escaping the (
    and so interpret this as just (...). So we escape the backslash. 
    R will interpret \\( ... \\)  as \(... \) meaning the literal 
    characters (&). 
3. (...)  Inside the pair in part 2 
    This is making use of the special meaning of parentheses. When we 
    enclose an expression in parentheses, whatever value is inside them 
    will be stored in a variable for later use. That variable is called 
    \1, which is what was used in the substitution pattern. Again, is 
    we just wrote \1, R would interpret it as if we were trying to escape 
    the 1. Writing \\1 is interpreted as the character \ followed by 1, 
    i.e. \1. 
4. Central .* Inside the pair in part 3 
    This is what we are looking for, all characters inside the parentheses. 
5. Final .* 
    This is in the expression to match any characters that may follow the 
    final set of parentheses. 

子功能將使用此帶有取代模式\ 1替換匹配的模式(在這種情況下,在字符串中的所有字符),即的內容變量包含第一個(僅在我們的例子中)捕獲組 - 最終括號內的內容。

+0

您可以評論解決方案。我認爲\\ 1是指正則表達式中的一些定義元素。它的作品,但理解它如何工作會更好 – userJT

+0

@userJT - 添加到答案 – G5W

1

如果它是最後一個部分的字符串,然後這個表達式將做到這一點:

/\(([^()]*)\)$/ 

釋:尋找一個開放(並匹配一切都在它之間是不是(),然後在該字符串的末尾有一個)

https://regex101.com/r/cEsQtf/1