2011-06-19 83 views
1

我有艱難的時間搞清楚了這一點:什麼是這個令人費解的正則表達式在做什麼?

($dwg, $rev, $rest) = ($file =~ /^(\d{3}[_-][\w\d]{3}[_-]\d{3,4}(?:[_-]\d{3,4})?)(?:[_ -]\w)?[_ ]{0,5}[rR](?:[eE][vV])?(?:\.)? ?([\w\d-]?) *(.*)/); 
+0

請使用編輯器中的括號來格式化您的代碼 –

+1

您是否嘗試將其放入像expresso(http://www.ultrapico.com/Expresso.htm)這樣的正則表達式工具中? – Rohith

+1

這行代碼被稱爲正則表達式(或更普遍,也許更準確地說,正則表達式)。這應該可以幫助你,儘管我相信你會在這裏找到很多幫助。 –

回答

7

這只是一個複雜的正則表達式,這使三個逮住組從$file$dwg$rev$rest

雖然正則表達式很複雜,但它並不使用非常複雜的規則 - 可能除了(?:something),這是非捕獲組。例如,作爲perl正則表達式的介紹,請參閱this

8

它看起來像是從文件名中提取日期$dwg,修訂$rev和後綴$rest。一般來說,日期最多可以有四個由下劃線或連字符分隔的部分,修訂版是一系列以rev(大寫或小寫)爲前綴的單詞字符,後綴包含修訂後第一個空白後的所有字符。這相當混亂,看起來它試圖一次解釋許多細微差別的案例。

^     # After the start of the string, 
(     # $dwg gets 
    \d{3}   # three digits, 
    [_-]   # a separator, 
    [\w\d]{3}  # three word characters, 
    [_-]   # another separator, 
    \d{3,4}  # three or four digits, 
    (?:   # and 
     [_-]  # a separator and 
     \d{3,4} # three or four more digits 
    )?    # which are optional. 
) 
(?:    # Next, 
    [_ -]   # another separator, 
    \w    # followed by a word character, 
)?     # also optional; 
[_ ]{0,5}   # a separator up to five characters long, 
[rR]    # then "R" or "r", 
(?: 
    [eE]   # or "rev" in any mix of case, 
    [vV] 
)?     # optionally; 
(?: 
    \.    # a dot, 
)?     # which too is optional; 
?     # and an optional space. 
(     # $rev gets 
    [\w\d-]?  # an optional word character or dash. 
) 
*     # Any number of spaces later, 
(.*)    # $rest gets the rest. 
+0

Ha。我懶得通過所有的:) –

+0

是的,我們有相同的解釋。只是一個觀點:'{3}'意味着'正好3次出現',而不是最多三次。 – Toto

+0

@ M42:你是對的。我責怪一天的時間。固定。 –

11

這裏有一個解釋:

^     : begining of string 
(     : start group 1; it populates $dwg 
    \d{3}   : 3 digit 
    [_-]   : _ or - character 
    [\w\d]{3}  : 3 alphanum, could be abreviated as \w{3} 
    [_-]   : _ or - character 
    \d{3,4}   : 3 or 4 digit 
    (?:    : start NON capture group 
     [_-]  : _ or - character 
     \d{3,4}  : 3 or 4 digit 
    )?    : end of non capture group optionnal 
)     : end of group 1 
(?:     : start NON capture group 
    [_ -]   : _ or space or - character 
    \w    : 1 alphanum 
)?     : end of non capture group optionnal 
[_ ]{0,5}   : 0 to 5 _ or space char 
[rR]    : r or R 
(?:     : start NON capture group 
    [eE]   : e or E 
    [vV]   : v or V 
)?     : end of non capture group optionnal 
(?:\.)?    : a dot not captured optionnal 
?     : optionnal space 
([\w\d-]?)   : group 2, 1 aphanum or - could be [\w-]; populates $rev 
*     : 0 or more spaces 
(.*)    : any number of any char but linefeed; populates $rest 
+0

+1 Dang,在我編輯我之前,我沒有看到您的答案。好吧。我們以不同的方式去思考。 –

12

YAPE::Regex::Explain是接受輸入任何正則表達式的模塊,並且作爲輸出提供的正則表達式做什麼解釋。這裏有一個例子:

use Modern::Perl; 
use YAPE::Regex::Explain; 

my $re = qr/^(\d{3}[_-][\w\d]{3}[_-]\d{3,4}(?:[_-]\d{3,4})?)(?:[_ -]\w)?[_ ]{0,5}[rR](?:[eE][vV])?(?:\.)? ?([\w\d-]?) *(.*)/; 

say YAPE::Regex::Explain->new($re)->explain(); 

而這裏的輸出:

The regular expression: 

(?-imsx:^(\d{3}[_-][\w\d]{3}[_-]\d{3,4}(?:[_-]\d{3,4})?)(?:[_ -]\w)?[_ ]{0,5}[rR](?:[eE][vV])?(?:\.)? ?([\w\d-]?) *(.*)) 

matches as follows: 

NODE      EXPLANATION 
---------------------------------------------------------------------- 
(?-imsx:     group, but do not capture (case-sensitive) 
         (with^and $ matching normally) (with . not 
         matching \n) (matching whitespace and # 
         normally): 
---------------------------------------------------------------------- 
^      the beginning of the string 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
    \d{3}     digits (0-9) (3 times) 
---------------------------------------------------------------------- 
    [_-]      any character of: '_', '-' 
---------------------------------------------------------------------- 
    [\w\d]{3}    any character of: word characters (a-z, 
          A-Z, 0-9, _), digits (0-9) (3 times) 
---------------------------------------------------------------------- 
    [_-]      any character of: '_', '-' 
---------------------------------------------------------------------- 
    \d{3,4}     digits (0-9) (between 3 and 4 times 
          (matching the most amount possible)) 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (optional 
          (matching the most amount possible)): 
---------------------------------------------------------------------- 
     [_-]      any character of: '_', '-' 
---------------------------------------------------------------------- 
     \d{3,4}     digits (0-9) (between 3 and 4 times 
           (matching the most amount possible)) 
---------------------------------------------------------------------- 
    )?      end of grouping 
---------------------------------------------------------------------- 
)      end of \1 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (optional 
          (matching the most amount possible)): 
---------------------------------------------------------------------- 
    [_ -]     any character of: '_', ' ', '-' 
---------------------------------------------------------------------- 
    \w      word characters (a-z, A-Z, 0-9, _) 
---------------------------------------------------------------------- 
)?      end of grouping 
---------------------------------------------------------------------- 
    [_ ]{0,5}    any character of: '_', ' ' (between 0 and 
          5 times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    [rR]      any character of: 'r', 'R' 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (optional 
          (matching the most amount possible)): 
---------------------------------------------------------------------- 
    [eE]      any character of: 'e', 'E' 
---------------------------------------------------------------------- 
    [vV]      any character of: 'v', 'V' 
---------------------------------------------------------------------- 
)?      end of grouping 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (optional 
          (matching the most amount possible)): 
---------------------------------------------------------------------- 
    \.      '.' 
---------------------------------------------------------------------- 
)?      end of grouping 
---------------------------------------------------------------------- 
    ?      ' ' (optional (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
    [\w\d-]?     any character of: word characters (a-z, 
          A-Z, 0-9, _), digits (0-9), '-' 
          (optional (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \2 
---------------------------------------------------------------------- 
    *      ' ' (0 or more times (matching the most 
          amount possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \3: 
---------------------------------------------------------------------- 
    .*      any character except \n (0 or more times 
          (matching the most amount possible)) 
---------------------------------------------------------------------- 
)      end of \3 
---------------------------------------------------------------------- 
)      end of grouping 
---------------------------------------------------------------------- 

一兩件事,往往更容易破譯一個正則表達式,而不訴諸外部工具是把A/X修改末的正則表達式(因此允許在正則表達式中大部分是自由形式的空白空間)。/x修飾符將允許您在不改變表達式的函數的情況下開始將包括換行符和製表符在內的空格插入到正則表達式中。這有助於將正則表達式的部分分組在一起。當然,如果RE裏面嵌入了大量的空白空間,那麼這並不會奏效。在這種不尋常的情況下,你最終會改變表達的含義。但對於任何正常的正則表達式來說,/ x修飾符是將它分解爲意義簇的第一步。

例如,我可能會開始在自己的正則表達式是這樣的:

m/^ 
    (
     \d{3} [_-] [\w\d]{3} [_-] \d{3,4} 
     (?: 
      [_-] \d{3,4} 
     )? 
    ) 
    # ......and so on. 
/x 

對於我來說,這樣做可以幫助我更好地想象這是怎麼回事。 您可以閱讀以下POD的正則表達式:perlrequick(快速入門指南),perlretut(更深入的教程),perlre(權威資源)和perlop。但是,傑弗裏弗裏德爾的傑作着作「掌握正則表達式」(奧萊利 - 在第三版中熟練掌握)沒有任何幫助。

注意:我注意到這個RE似乎有一個嵌入式空間接近尾聲。它會更加明顯地表示爲\ x20,並且以這種方式更改它可以安全地使用/ x修飾符。

+0

適用於YAPE :: Regex :: Explain的+1 –