在JAVA中使用反向引用捕獲正則表達式的遞歸組

我試圖在正則表達式中使用反向引用向組中遞歸捕獲多個組。儘管我正在使用Pattern和Matcher以及「while（matcher.find（））」循環，但它仍然只捕獲最後一個實例，而不是所有實例。在我的情況下，唯一可能的標籤<SM>，<PO>，<POF>，<POS>，<POI>，<POL>，<poif>，<離題>。由於這些格式標記，我需要捕獲：在JAVA中使用反向引用捕獲正則表達式的遞歸組

標籤以外（任何文字，使我可以格式化爲「正常」的文字，而我通過在標籤之前捕獲任何文本要對這個一個組，而我在另一個組中捕獲標籤本身，並且在遍歷事件時，我刪除了從原始字符串中捕獲的所有內容;如果最後還剩下任何文本，則將其格式化爲「普通」文本）
的標籤用「名」，讓我知道我怎麼會有來格式化文本，在標籤內
將相應格式的標籤名稱及其關聯RUL標籤的文本內容ES

這是我的示例代碼：

 String currentText = "the man said:<pof>「This one, at last, is bone of my bones</pof><poi>and flesh of my flesh;</poi><po>This one shall be called ‘woman,’</po><poil>for out of man this one has been taken.」</poil>"; 
     String remainingText = currentText; 

     //first check if our string even has any kind of xml tag, because if not we will just format the whole string as "normal" text 
     if(currentText.matches("(?su).*<[/]{0,1}(?:sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1}>.*")) 
     {     
      //an opening or closing tag has been found, so let us start our pattern captures 
      //I am using a backreference \\2 to make sure the closing tag is the same as the opening tag 
      Pattern pattern1 = Pattern.compile("(.*)<((sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1})>(.*?)</\\2>",Pattern.UNICODE_CHARACTER_CLASS); 
      Matcher matcher1 = pattern1.matcher(currentText);     
      int iteration = 0; 
      while(matcher1.find()){ 
       System.out.print("Iteration "); 
       System.out.println(++iteration); 
       System.out.println("group1:"+matcher1.group(1)); 
       System.out.println("group2:"+matcher1.group(2)); 
       System.out.println("group3:"+matcher1.group(3)); 
       System.out.println("group4:"+matcher1.group(4)); 

       if(matcher1.group(1) != null && matcher1.group(1).isEmpty() == false) 
       { 
        m_xText.insertString(xTextRange, matcher1.group(1), false); 
        remainingText = remainingText.replaceFirst(matcher1.group(1), ""); 
       } 
       if(matcher1.group(4) != null && matcher1.group(4).isEmpty() == false) 
       { 
        switch (matcher1.group(2)) { 
         case "pof": [...] 
         case "pos": [...] 
         case "poif": [...] 
         case "po": [...] 
         case "poi": [...] 
         case "pol": [...] 
         case "poil": [...] 
         case "sm": [...] 
        } 
        remainingText = remainingText.replaceFirst("<"+matcher1.group(2)+">"+matcher1.group(4)+"</"+matcher1.group(2)+">", ""); 
       } 
      }

的的System.out.println僅在我的控制檯輸出一次，用這些結果：

Iteration 1: 
    group1:the man said:<pof>「This one, at last, is bone of my bones</pof><poi>and flesh of my flesh;</poi><po>This one shall be called ‘woman,’</po>; 
    group2:poil 
    group3:po 
    group4:for out of man this one has been taken.」

第3組是要忽略，唯一有用的羣體是1,2和4（羣體3是羣體2的一部分）。爲什麼這隻捕獲最後一個標籤實例「poil」，而沒有捕獲前面的「pof」，「poi」和「po」標籤？

我想看看會是這樣的輸出：

Iteration 1: 
    group1:the man said: 
    group2:pof 
    group3:po 
    group4:「This one, at last, is bone of my bones 

Iteration 2: 
    group1: 
    group2:poi 
    group3:po 
    group4:and flesh of my flesh; 

Iteration 3: 
    group1: 
    group2:po 
    group3:po 
    group4:This one shall be called ‘woman,’ 

Iteration 3: 
    group1: 
    group2:poil 
    group3:po 
    group4:for out of man this one has been taken.」

來源

2015-08-17 JohnRDOrazio

我剛剛發現了這個問題，它只是需要在第一捕獲非貪婪量詞的答案，就像我在第四捕獲組。這完全按照需要工作：

Pattern pattern1 = Pattern.compile("(.*?)<((sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1})>(.*?)</\\2>",Pattern.UNICODE_CHARACTER_CLASS);

來源

2015-08-17 04:01:08 JohnRDOrazio

在JAVA中使用反向引用捕獲正則表達式的遞歸組

回答

相關問題