正則表達式在java中：匹配BOL和EOL

我嘗試解析窗口INI與Windows下的Java文件。假設內容是：正則表達式在java中：匹配BOL和EOL

[section1] 
key1=value1 
key2=value2 
[section2] 
key1=value1 
key2=value2 
[section3] 
key1=value1 
key2=value2

我用的是如下因素代碼：

Pattern pattSections = Pattern.compile("^\\[([a-zA-Z_0-9\\s]+)\\]$([^\\[]*)", Pattern.DOTALL + Pattern.MULTILINE); 
Pattern pattPairs = Pattern.compile("^([a-zA-Z_0-9]+)\\s*=\\s*([^$]*)$", Pattern.DOTALL + Pattern.MULTILINE); 
// parse sections 
Matcher matchSections = pattSections.matcher(content); 
while (matchSections.find()) { 
    String keySection = matchSections.group(1); 
    String valSection = matchSections.group(2); 
    // parse section content 
    Matcher matchPairs = pattPairs.matcher(valSection); 
    while (matchPairs.find()) { 
     String keyPair = matchPairs.group(1); 
     String valPair = matchPairs.group(2); 
    } 
}

但它不能正常工作：

的SECTION1不匹配。這可能是因爲這不是從'EOL'之後開始的。當我把[section1]之前的空字符串匹配。
valSection返回'\ r \ nke1 = value1 \ r \ nkey2 = value2 \ r \ n'。 keyPair返回'key1'。它看起來像確定。但valPair根據需要返回'value1 \ r \ nkey2 = value2 \ r \ n'，但不返回'value1'。

這裏有什麼問題？

來源

2012-05-11 igortche

你不排除「新線」的檢查值。 –

廣告2.在'pattPairs'中定義的模式是貪婪的，因此匹配直到第二個鍵的結尾。您可以在貪婪和非貪婪匹配念起來這裏怎麼補償：與http://docs.oracle.com/javase/tutorial/essential/regex/quant.html – Pieter

你嘗試替換'\ r \ N'首先是\ n？ – Thomas

第一個正則表達式只是工作（是不是你如何看的文件有問題？），並將第二個中的「？」簽署以不情願的方式使用它。

import java.util.regex.Matcher; 
import java.util.regex.Pattern; 

public class Test { 

    public static void main(String[] args) { 
     String content = "[section1]\r\n" + 
     "key1=value1\r\n" + 
     "key2=value2\r\n" + 
     "[section2]\r\n" + 
     "key1=value1\r\n" + 
     "key2=value2\r\n" + 
     "[section3]\r\n" + 
     "key1=value1\r\n" + 
     "key2=value2\r\n"; 

     Pattern pattSections = Pattern.compile(
       "^\\[([a-zA-Z_0-9\\s]+)\\]$([^\\[]*)", Pattern.DOTALL 
         + Pattern.MULTILINE); 
     Pattern pattPairs = Pattern.compile(
       "^([a-zA-Z_0-9]+)\\s*=\\s*([^$]*?)$", Pattern.DOTALL 
         + Pattern.MULTILINE); 
     // parse sections 
     Matcher matchSections = pattSections.matcher(content); 
     while (matchSections.find()) { 
      String keySection = matchSections.group(1); 
      String valSection = matchSections.group(2); 
      // parse section content 
      Matcher matchPairs = pattPairs.matcher(valSection); 
      while (matchPairs.find()) { 
       String keyPair = matchPairs.group(1); 
       String valPair = matchPairs.group(2); 
      } 
     } 

    } 

}

來源

2012-05-11 13:51:04

不幸的是，這不適用於我的情況。在第一種模式中，'^'與BOF中的BOL不匹配，符號$被視爲簡單$，而不是第二種模式中的[^ $]'中的EOL。 – igortche

您是否嘗試過按照建議執行代碼？也許你可以提供一個你的ini文件樣本... –

您不需要DOTALL標誌，因爲在模式中根本不使用點。

我認爲Java將\n本身視爲換行符，因此\r將不會被處理。你的模式：

^\\[([a-zA-Z_0-9\\s]+)\\]$

不會是真的，但insted的

^\\[([a-zA-Z_0-9\\s]+)\\]\r$

意志。

我建議你忽視MULTILINE太並使用以下模式作爲行分隔符：

(^|\r\n) 
($|\r\n)

來源

2012-08-09 21:50:23 vbence

正則表達式在java中：匹配BOL和EOL

回答

相關問題