捕獲與正則表達式的多個文本塊中的java

應該用於提取由也應分析他們的標題界定的多個文本塊，例如什麼正則表達式：捕獲與正則表達式的多個文本塊中的java

some text info before message sequence 
============ 
first message header that should be parsed (may contain = character) 
============ 
first multiline 
message body that 
should also be parsed 
(may contain = character) 
============ 
second message header that should be parsed 
============ 
second multiline 
message body that 
should also be parsed 
... and so on

我試圖用：

String regex = "^=+$\n"+ 
     "^(.+)$\n"+ 
     "^=+$\n"+ 
     "((?s:(?!(^=.+)).+))"; 
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);

但((?s:(?!(^=.+)).+))吃第二個消息WEEL。這是顯示問題的測試：

import java.util.regex.Matcher; 
import java.util.regex.Pattern; 
import org.junit.Assert; 
import org.junit.Test; 
public class ParsingTest { 
@Test 
public void test() { 
    String fstMsgHeader = "first message header that should be parsed (may contain = character)"; 
    String fstMsgBody = "first multiline\n"+ 
         "message body that\n"+ 
         "should also be parsed\n"+ 
         "(may contain = character)"; 
    String sndMsgHeader = "second message header that should be parsed"; 
    String sndMsgBody = "second multiline\n"+ 
      "message body that\n"+ 
      "should also be parsed\n"+ 
      "... and so on"; 
    String sample = "some text info before message sequence\n"+ 
        "============\n"+ 
        fstMsgHeader+"\n"+ 
        "============\n"+ 
        fstMsgBody+"\n"+ 
        "============\n"+ 
        sndMsgHeader+"\n"+ 
        "============\n"+ 
        sndMsgBody +"\n"; 
    System.out.println(sample); 
    String regex = "^=+$\n"+ 
        "^(.+)$\n"+ 
        "^=+$\n"+ 
        "((?s:(?!(^=.+)).+))"; 
    Pattern p = Pattern.compile(regex, Pattern.MULTILINE); 
    Matcher matcher = p.matcher(sample); 
    int blockNumber = 1; 
    while (matcher.find()) { 
     System.out.println("Block "+blockNumber+": "+matcher.group(0)+"\n_________________"); 
     if (blockNumber == 1) { 
      Assert.assertEquals(fstMsgHeader, matcher.group(1)); 
      Assert.assertEquals(fstMsgBody, matcher.group(2)); 
     } else { 
      Assert.assertEquals(sndMsgHeader, matcher.group(1)); 
      Assert.assertEquals(sndMsgBody, matcher.group(2)); 
     } 
    } 
}

}

來源

2013-08-20 Mikhail Tsaplin

爲什麼不使用sample.split（「============」）？ – Marc

你期望得到什麼樣的產出，以及你實際擁有哪一種產出？ – sp00m

Reg。拆分用法：我已經完成了拆分，但是看起來，使用一個正則表達式捕獲消息及其頭部使得代碼更加清晰（一個while循環與組訪問器）。所以我正在考慮這個變種。 –

我不知道如果這是你在找什麼，但也許這正則表達式將有助於

String regex = 
     "={12}\n" + // twelve '=' marks and new line mark 
     "(.+?)" +  // minimal match that has 
     "\n={12}\n" + // new line mark with twelve '=' marks after it 
     "(.+?)(?=\n={12}|$)"; // minimal match that will have new line 
           // character and twelve `=` marks after 
           // it or end of data $

，並使其發揮作用你應該使點也匹配Pattern.DOTALL標誌的新行字符。

Pattern p = Pattern.compile(regex, Pattern.DOTALL);

來源

2013-08-20 15:54:09 Pshemo

Pshemo，謝謝你的工作。你能描述一下（。+？）的含義嗎？ –

@MikhailTsaplin通常'（。+？）'是貪婪的，所以它會盡量找到最大可能。如果你添加'？'，它會使'+'量詞不情願，所以它會嘗試找到最小匹配。有關詳情，請訪問http://docs.oracle.com/javase/tutorial/essential/regex/quant.html。 – Pshemo

捕獲與正則表達式的多個文本塊中的java

回答

相關問題