2010-03-19 14 views
0

我使用庫來解析iCalendar文件,但我不明白正則表達式來分割屬性。
的iCalendar酒店有3種不同風格:正則表達式來解析iCalendar文件中的動作腳本

BEGIN:VEVENT 
DTSTART;VALUE=DATE:20080402 
RRULE:FREQ=YEARLY;WKST=MO 

庫使用這個表達式,我想明白了:

var matches:Array = data.match(/(.+?)(;(.*?)=(.*?)((,(.*?)=(.*?))*?))?:(.*)$/); 
p.name = matches[1]; 
p.value = matches[9];     
p.paramString = matches[2]; 

感謝。

+2

這正則表達式是如此排斥我猜沒有人想解釋它:P。這是非常基本的東西 - 搜索「捕獲組」和「非貪婪」,你會發現它。 – Kobi 2010-03-19 22:20:02

回答

5

這是一個可怕的正則表達式! .*.*?的意思是匹配任意數量(貪婪)或少數(懶惰)的東西。這些只能作爲最後的手段。當正則表達式不匹配輸入文本時,不正確的使用將導致catastrophic backtracking。所有你需要了解這個正則表達式,你不想寫這樣的正則表達式。

讓我說明我將如何處理這個問題。顯然iCalendar File Format是基於行的。每行都有一個由冒號分隔的屬性和值。該屬性可以具有用分號分隔的參數。這意味着屬性不能包含換行符,分號或冒號,可選參數不能包含換行符或冒號,並且該值不能包含換行符。這方面的知識使我們能夠編寫使用negated character classes一個有效的正則表達式:

([^\r\n;:]+)(;[^\r\n:]+)?:(.+) 

或者在ActionScript:

var matches:Array = data.match(/([^\r\n;:]+)(;[^\r\n:]+)?:(.+)/); 
p.name = matches[1]; 
p.value = matches[3]; 
p.paramString = matches[2]; 

正如使用RegexBuddy解釋:

Match the regular expression below and capture its match into backreference number 1 «([^\r\n;:]+)» 
    Match a single character NOT present in the list below «[^\r\n;:]+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
     A carriage return character «\r» 
     A line feed character «\n» 
     One of the characters 「;:」 «;:» 
Match the regular expression below and capture its match into backreference number 2 «(;[^\r\n:]+)?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
    Match the character 「;」 literally «;» 
    Match a single character NOT present in the list below «[^\r\n:]+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
     A carriage return character «\r» 
     A line feed character «\n» 
     The character 「:」 «:» 
Match the character 「:」 literally «:» 
Match the regular expression below and capture its match into backreference number 3 «(.+)» 
    Match any single character that is not a line break character «.+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
+0

+1很好的解釋! – 2010-05-02 12:15:43