正則表達式中的嵌套/重複組

我必須解析多行字符串並檢索特定位置的電子郵件地址。正則表達式中的嵌套/重複組

而且我用下面的代碼完成它：

String input = "Content-Type: application/ms-tnef; name=\"winmail.dat\"\r\n" 
      + "Content-Transfer-Encoding: binary\r\n" + "From: ABC aa DDD <[email protected]>\r\n" 
      + "To: DDDDD dd <[email protected]>\r\n" + "CC: Rrrrr rrede <[email protected]>, Dsssssf V R\r\n" 
      + " <[email protected]>, Psssss A <[email protected]>, Logistics\r\n" 
      + " <[email protected]>, Gssss Bsss P <[email protected]>\r\n" 
      + "Subject: RE: [MyApps] (PRO-34604) PR for Additional Monitor allocation [CITS\r\n" 
      + " Ticket:258849]\r\n" + "Thread-Topic: [MyApps] (PRO-34604) PR for Additional Monitor allocation\r\n" 
      + " [CITS Ticket:258849]\r\n" + "Thread-Index: AQHRXMJHE6KqCFxKBEieNqGhdNy7Pp8XHc0A\r\n" 
      + "Date: Mon, 1 Feb 2016 17:56:17 +0530\r\n" 
      + "Message-ID: <[email protected]>\r\n" 
      + "References: <[email protected]>\r\n" 
      + " <[email protected]>\r\n" 
      + "In-Reply-To: <[email protected]>\r\n" 
      + "Accept-Language: en-US\r\n" + "Content-Language: en-US\r\n" + "X-MS-Has-Attach:\r\n" 
      + "X-MS-Exchange-Organization-SCL: -1\r\n" 
      + "X-MS-TNEF-Correlator: <[email protected]>\r\n" 
      + "MIME-Version: 1.0\r\n" + "X-MS-Exchange-Organization-AuthSource: TURWINSRVRPS01.abc.com\r\n" 
      + "X-MS-Exchange-Organization-AuthAs: Internal\r\n" + "X-MS-Exchange-Organization-AuthMechanism: 04\r\n" 
      + "X-Originating-IP: [1.1.1.7]"; 

    Pattern pattern = Pattern.compile("To:(.*<([^>]*)>).*Message-ID", Pattern.DOTALL); 
    Matcher matcher = pattern.matcher(input); 
    while (matcher.find()) { 
     Pattern innerPattern = Pattern.compile("<([^>]*)>"); 
     Matcher innerMatcher = innerPattern.matcher(matcher.group(1)); 
     while (innerMatcher.find()) { 
      System.out.println("-->:" + innerMatcher.group(1)); 
     } 
    }

這工作正常。我將第一部分從To分組到Message這是必需的部分。然後我有另一個分組來提取電子郵件ID。有沒有更好的方法來做到這一點？我們可以用一個模式匹配器來做到嗎？

更新：這是預期的輸出：

-->:[email protected] 
-->:[email protected] 
-->:[email protected] 
-->:[email protected] 
-->:[email protected] 
-->:[email protected]

來源

2016-02-02 Ram

您能展示您期望檢索的內容嗎？ – Cyrbil

我認爲你正在尋找內<...>所有電子郵件To:後Message-ID之前到來。所以，你可以使用\G基於正則表達式一通：

Pattern pt = Pattern.compile("(?:\\bTo:|(?!^)\\G).*?<([^>]*)>(?=.*Message-ID)", Pattern.DOTALL); 
Matcher m = pt.matcher(input); 
while (m.find()) { 
    System.out.println(m.group(1)); 
}

見IDEONE demo和regex demo

正則表達式匹配：

(?:\\bTo:|(?!^)\\G) - 領先的邊界，無論是To:作爲一個整體字或上一次成功匹配後的位置
.*? - 任何字符，任意數量的出現在fi首先
<([^>]*)> - 串開始<隨後與比>其他零個或多個字符（第1組），並遵循的收盤>
(?=.*Message-ID) - 積極前瞻，使得確保有Message-ID前面傳來的電流匹配。

來源

2016-02-02 13:06:35

隨着這個答案，[這]（http://stackoverflow.com/a/35154460/2270563）答案也是有幫助的！ – Ram

理想情況下，你也可以使用lookarounds：

(?<=To:.*)<([^>]+)>(?=.*Message-ID)

Regular expression visualization

可視化的Debuggex

不幸的是，Java doesn't support variable length in lookbehinds。解決方法可能是：

(?<=To:.{0,1000})<([^>]+)>(?=.*Message-ID)

來源

2016-02-02 13:10:27 sp00m

Java支持您在答案中顯示的[*受限制lookbehind *]（http://www.rexegg.com/regex-lookarounds.html#width）。 –

正則表達式中的嵌套/重複組

回答

相關問題