2014-09-19 25 views
1

我試圖匹配寫成單詞,數字或羅馬數字的數字。下面是一堆樣品正則表達式匹配寫成單詞,數字或羅馬數字的數字

CHAPTER 1 
CHAPTER 2 
CHAPTER THREE 
CHAPTER IV 
CHAPTER TWENTY TWO 

的我在正則表達式非常糟糕,這裏就是我這麼遠。

(CHAPTER (([0-9]+)|(/* words - see below */)|(/* roman - see below */))) 

// words 
(TWENTY|THIRTY|etc)?(|-)?(ONE|TWO|THREE|FOUR|FIVE|etc)? 

// roman 
(I|II|III|IV|V|etc)+ 

聲明捕捉第1章,第2章和第三章,但嘗試匹配IV作爲一個單詞(我猜它匹配FIVE不知何故?)。二十二根本不匹配。

任何人都可以幫忙嗎?下面是完整的正則表達式

(CHAPTER (
([0-9]+)| 
((TWENTY|THIRTY)?(|-)?(ONE|TWO|THREE|FOUR|FIVE)?)| 
((I|II|III|IV|V)+) 
)) 

注:

這樣做的關鍵是把這些文本的格式轉換成實際的整數。我的方法來做到這一點在每種情況下,所以我需要的各類案件

+0

你'章IV'在這裏寫成'章V',感嘆號VS'I'!如果這是您的文件中的內容,則不匹配。 – 2014-09-19 21:54:58

+3

除非有必要明確地寫出這些,否則會更容易做'章(\ w +(?: \ w +)?)' – hwnd 2014-09-19 21:59:44

+0

@BobKaufman這是一個錯字。在原始 – roryok 2014-09-20 07:11:42

回答

1

得到了解析器,如果給出的表面看起來像有效的羅馬/文本輸入但不是,可以優雅地失敗,你可以將它們全部調用並查看哪個傳遞。

如果你不只是想全部調用它們,這個正則表達式應該確定哪個解析器傳遞每個輸入。

var re = new Regex(
    @"CHAPTER (?:(?<arabic>\d+)|(?<roman>[IVXLCDM]+)|(?<text>[A-Z ]+))"); 

稱爲例如作爲

var input = @"CHAPTER 1 
CHAPTER 2 
CHAPTER THREE 
CHAPTER IV 
CHAPTER TWENTY TWO"; 

foreach (Match match in re.Matches(input)) 
{ 
    if (match.Groups["arabic"].Success) 
    { 
     Console.WriteLine("Pass {0} to Arabic parser", match.Groups["arabic"].Value); 
    } 
    else if (match.Groups["roman"].Success) 
    { 
     Console.WriteLine("Pass {0} to Roman parser", match.Groups["roman"].Value); 
    } 
    else if (match.Groups["text"].Success) 
    { 
     Console.WriteLine("Pass {0} to Text parser", match.Groups["text"].Value); 
    } 
} 

導致

Pass 1 to Arabic parser 
Pass 2 to Arabic parser 
Pass THREE to Text parser 
Pass IV to Roman parser 
Pass TWENTY TWO to Text parser 
+0

這對我來說,謝謝! – roryok 2014-09-22 09:41:22

0
CHAPTER (?:\d+|(?:XVIII|XVII|XIII|VIII|XIV|XVI|XII|III|VII|XV|VI|IV|XI|IX|XX|III|II|X|V|I)|(?:(?P<d>TWENTY|THIRTY|FORTY|FIFTY|SIXTY|SEVENTY|EIGHTY|NINETY)?(?(d)(?: (?:ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE))?|(?:ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGTHEEN|NINETEEN)))) 

細分和解釋區分:既然你已經

CHAPTER // match "CHAPTER " literally 
    (?:// then either: 
     \d+// 1: digits 
     | 
     (?:// or 2: roman numerals (up to 18) (note: make sure to order them by length!) 
      XVIII|XVII|XIII|VIII|XIV|XVI|XII|III|VII|XV|VI|IV|XI|IX|XX|III|II|X|V|I 
     ) 
     |// or 3: words 
     (?: 
      (?P<d>// first, one of the literals "TWENTY", "THIRTY", etc... 
       TWENTY|THIRTY|FORTY|FIFTY|SIXTY|SEVENTY|EIGHTY|NINETY 
      )?// ...if possible 
      (?(d) // then, if the previous group matched... 
       (?: // ...a space... 
        (?:// ...and the numbers "ONE" to "NINE" 
         ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE 
        ) 
       )?// ...if possible. 
       | 
       (?://otherwise, one of "ONE" to "NINETEEN" 
        ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGTHEEN|NINETEEN 
       ) 
      ) 
     ) 
    ) 

Demo.

1

正則表達式爲羅馬數字是:\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b
正則表達式爲數字:\d+
正則表達式爲字面:[a-z ]+

結合所有這些在:

CHAPTER (?:(?<digits>\d+)|(?<roman>\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b)|(?<literal>[A-Z ]+)) 
相關問題