2014-06-18 82 views
1

我需要建立以下類型的字符串的正則表達式幫助基因組位置,我使用Python,但一般的正則表達式會做...匹配使用正則表達式

chr1:82137-81236 
X 2Mb 6Mb 
chr4:87K 1000K 

我開始一個,但我無法得到它的一切工作,我需要:

(CHR)*\s*([0-9]{1,2}|X|Y|MT)\s*(-|:)?\s*(\d+)\s*(MB|M|K)?\s*(-|:)?\s*(\d+)\s*(MB|M|K)? 

它將匹配的情況下,我不希望它來實現,例如:

CHR33 -  12 3 

匹配,但沒有辦法,我想:

Group 1. CHR 
Group 2. 33 
Group 3. - 
Group 4. 12 
Group 5.  
Group 6.  
Group 7. 3 
Group 8. 

我想是以下組返回:

Group 1: CHR or nothing 

Group 2: The chromosome value (1-20,X,Y,MT) 

Group 3: The separator between chromosome and first position 

Group 4: The numeric portion of the first position 

Group 5: The numeric quantifier (M,Mb,K) or nothing if none 

Group 6: The separator between position1 and position2 

Group 7: The numeric portion of the second position 

Group 8: The numeric quantifier (M,Mb,K) or nothing if none 

僞正則表達式應該是這樣的:

(CHR)(1-20|MT|X|Y)(delimiter \s*|-|:)(pos1 numeric)(pos1 quantifier)(delimiter \s*|-|:)(pos2 numeric)(pos2 quantifier) 

回答

2

更改正則表達式以允許空匹配(爲了便於閱讀,添加了空格):

(CHR|)*\s*    # CHR or nothing 
([0-9]{1,2}|X|Y|MT)\s* # Chromesome value 
(-|:)?\s*    # Separator 
(\d+)\s*    # Numeric portion of 1st position 
(MB|M|K|)?\s*   # Numeric quantifier or nothing 
(-|:|)?\s*    # Separator b/w position 1 and position 2 or nothing 
(\d+|)\s*    # Numeric portio of the 2nd position or nothing 
(MB|M|K|)?    # Numeric quantifier or nothing 

Regex101 Demo

+1

非常整潔修復,+1 :) @mattjvincent你可以通過在你的選擇加入'(?X)'標誌的頂部或're.VERBOSE'使用Python中它的整潔格式。 – zx81