1
我試圖在正則表達式中使用反向引用向組中遞歸捕獲多個組。儘管我正在使用Pattern和Matcher以及「while(matcher.find())」循環,但它仍然只捕獲最後一個實例,而不是所有實例。在我的情況下,唯一可能的標籤<SM>,<PO>,<POF>,<POS>,<POI>,<POL>,<poif>,<離題>。由於這些格式標記,我需要捕獲:在JAVA中使用反向引用捕獲正則表達式的遞歸組
- 標籤以外(任何文字,使我可以格式化爲「正常」的文字,而我通過在標籤之前捕獲任何文本要對這個一個組,而我在另一個組中捕獲標籤本身,並且在遍歷事件時,我刪除了從原始字符串中捕獲的所有內容;如果最後還剩下任何文本,則將其格式化爲「普通」文本)
- 的標籤用「名」,讓我知道我怎麼會有 來格式化文本,在標籤內
- 將相應格式的標籤名稱及其關聯RUL標籤的文本內容ES
這是我的示例代碼:
String currentText = "the man said:<pof>「This one, at last, is bone of my bones</pof><poi>and flesh of my flesh;</poi><po>This one shall be called ‘woman,’</po><poil>for out of man this one has been taken.」</poil>";
String remainingText = currentText;
//first check if our string even has any kind of xml tag, because if not we will just format the whole string as "normal" text
if(currentText.matches("(?su).*<[/]{0,1}(?:sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1}>.*"))
{
//an opening or closing tag has been found, so let us start our pattern captures
//I am using a backreference \\2 to make sure the closing tag is the same as the opening tag
Pattern pattern1 = Pattern.compile("(.*)<((sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1})>(.*?)</\\2>",Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher1 = pattern1.matcher(currentText);
int iteration = 0;
while(matcher1.find()){
System.out.print("Iteration ");
System.out.println(++iteration);
System.out.println("group1:"+matcher1.group(1));
System.out.println("group2:"+matcher1.group(2));
System.out.println("group3:"+matcher1.group(3));
System.out.println("group4:"+matcher1.group(4));
if(matcher1.group(1) != null && matcher1.group(1).isEmpty() == false)
{
m_xText.insertString(xTextRange, matcher1.group(1), false);
remainingText = remainingText.replaceFirst(matcher1.group(1), "");
}
if(matcher1.group(4) != null && matcher1.group(4).isEmpty() == false)
{
switch (matcher1.group(2)) {
case "pof": [...]
case "pos": [...]
case "poif": [...]
case "po": [...]
case "poi": [...]
case "pol": [...]
case "poil": [...]
case "sm": [...]
}
remainingText = remainingText.replaceFirst("<"+matcher1.group(2)+">"+matcher1.group(4)+"</"+matcher1.group(2)+">", "");
}
}
的的System.out.println僅在我的控制檯輸出一次,用這些結果:
Iteration 1:
group1:the man said:<pof>「This one, at last, is bone of my bones</pof><poi>and flesh of my flesh;</poi><po>This one shall be called ‘woman,’</po>;
group2:poil
group3:po
group4:for out of man this one has been taken.」
第3組是要忽略,唯一有用的羣體是1,2和4(羣體3是羣體2的一部分)。爲什麼這隻捕獲最後一個標籤實例「poil」,而沒有捕獲前面的「pof」,「poi」和「po」標籤?
我想看看會是這樣的輸出:
Iteration 1:
group1:the man said:
group2:pof
group3:po
group4:「This one, at last, is bone of my bones
Iteration 2:
group1:
group2:poi
group3:po
group4:and flesh of my flesh;
Iteration 3:
group1:
group2:po
group3:po
group4:This one shall be called ‘woman,’
Iteration 3:
group1:
group2:poil
group3:po
group4:for out of man this one has been taken.」