2014-01-27 18 views
0

這是我的數據及其模式:正則表達式解析器一步一步的Java

// _23.02_ANTALYA____________FRANKFURT___________DE_7461_18:20-21:00________________ 
public static final String FLIGHT_DEFAULT_PATTERN = "\\s+\\d{2}.\\d{2}\\s[A-Z]+\\s+[A-Z]+\\s+[A-Z\\s]{3}[\\d\\s]{5}\\d{2}:\\d{2}-\\d{2}:\\d{2}\\s+"; 

下劃線空格字符。現在我需要一個將每個正則表達式項與數據分開的類。例如

\\s+ = " " 
\\d{2} = "23" 
. = "." 
\\d{2} = "02" 
\\s = " " 
[A-Z]+ = "ANTALYA" 

etc ...這必須按模式排序。

我該怎麼做,或者是否有這樣的圖書館?

+6

閱讀文檔。你需要捕獲組。 – devnull

+0

其實我想象一個算法,但如果有一個工具可以很好。我不想浪費我的時間。 – kodmanyagha

回答

0

我找到了一種不同的方式。我用手劃分了幾塊。

// _24.02_MAURITIUS_________HAMBURG________________via:FRA_DE/LH____08:30-20:05_____ 
public static final List<String> FLIGHT_VIA_PATTERN = Arrays.asList("\\s+", "\\d{2}", "\\.", "\\d{2}", "\\s+", "[A-Z]+", "\\s+", "[A-Z]+", "\\s+", "via:", "[A-Z\\s]{4}", "[A-Z]{2,3}", "/", 
     "[A-Z]{2,3}", "\\s+", "\\d{2}", ":", "\\d{2}", "\\-", "\\d{2}", ":", "\\d{2}", "\\s+"); 

之後,我用了一個循環,一切都很好。這個問題可以關閉。

2

正如@devnull提到的,你應該使用capturing groups

(\s+)(\d{2})(.)(\d{2})(\s)([A-Z]+)(\s+)([A-Z]+)(\s+)([A-Z\s]{3})([\d\s]{5})(\d{2}:\d{2})(-)(\d{2}:\d{2})(\s+) 

見正則表達式的完整解釋上Regex101

然後,您可以使用類似下面的文本匹配並提取單個值:

String text = " 23.02 ANTALYA   FRANKFURT   DE 7461 18:20-21:00     "; 
Pattern pattern = Pattern.compile("(\\s+)(\\d{2})(.)(\\d{2})(\\s)([A-Z]+)(\\s+)([A-Z]+)(\\s+)([A-Z\\s]{3})([\\d\\s]{5})(\\d{2}:\\d{2})(-)(\\d{2}:\\d{2})(\\s+)"); 
Matcher matcher = pattern.matcher(text); 
if (matcher.find()) { 
    for (int i = 1; i < matcher.groupCount(); i++) { 
     System.out.println(matcher.group(i)); 
    } 
} 

爲了使它更容易提取特定的領域,你可以(在Java 7中和更高版本)使用命名捕獲組:

(?<LeadSpace>\s+)(?<Day>\d{2})(.)(?<Month>\d{2})... 

然後,您可以使用類似下面讓每個命名組:

... 
if (matcher.find()) { 
    System.out.println(matcher.group("LeadSpace")); 
    System.out.println(matcher.group("Day")); 
    System.out.println(matcher.group("Month")); 
    ... 
} 
+0

Thx爲您的答案。我用不同的方式解決了我的問題,但您的答案有更多可用的信息。我們在公司使用java6。 – kodmanyagha