排除小寫括號上的標記字母

字符串可以包含一個到多個小寫字母的括號，如String content = "This is (a) nightmare";我想將字符串轉換爲"<centamp>This is </centamp>(a) <centamp>nightmare</centamp>";因此，基本上在該字符串周圍添加百分比標記，但是如果它應該包含括號中的小寫字母被排除在標記之外。排除小寫括號上的標記字母

這是我到目前爲止所嘗試的，但它沒有達到預期的結果。在字符串中可能沒有多個括號，並且每個括號都不應該出現在標記中。

Pattern pattern = Pattern.compile("^(.*)?(\\([a-z]*\\))?(.*)?$", Pattern.MULTILINE);  
String content = "This is (a) nightmare"; 
System.out.println(content.matches("^(.*)?(\\([a-z]*\\))?(.*)?$")); 
System.out.println(pattern.matcher(content).replaceAll("&lt;centamp&gt;$1$3&lt;/centamp&gt;$2"));

來源

2013-05-14 Phoenix

我會*上'分裂*（一）'，然後圍繞每個非空部分與' ..'。通過將它分爲不同的步驟來思考/推理就容易多了。 – user2246674

我無法控制。我正在使用api .. – Phoenix

@Phoenix：鑑於您可以編寫代碼，我認爲這個建議是可行的。 – nhahtdh

這可以在一個replaceAll來完成：

String outputString = 
    inputString.replaceAll("(?s)\\G((?:\\([a-z]+\\))*+)((?:(?!\\([a-z]+\\)).)+)", 
          "$1<centamp>$2</centamp>");

它允許內支架\$[a-z]+\$小寫英文字母字符的非空序列。

特點：僅

空格序列的標籤。
圍繞空字符串不會有標籤。

說明：

\G斷言匹配邊界，即下一個匹配只能從最後比賽結束啓動。它也可以匹配字符串的開頭（當我們還沒有找到任何匹配時）。
正則表達式中的每一個匹配將包含的序列：0或更多個連續\$[a-z]+\$（不允許之間的空間），並且隨後不形成\$[a-z]+\$序列至少1個字符。
- 0或更多個連續\$[a-z]+\$要覆蓋住字符串不以\$[a-z]+\$啓動的情況下，並且其中所述字符串不包含\$[a-z]+\$的情況。
  
  在用於該部分(?:\$[a-z]+\$)*+圖案 - 注意，+*後使量詞所有格，換句話說，它不允許回溯。簡單地說，一個優化。
- 一個字符的限制是必要的，以防止添加標籤包圍空字符串。
  
  在此部分(?:(?!\$[a-z]+\$).)+模式 - 請注意，爲每一個字符，我檢查它是否是模式\$[a-z]+\$部分匹配它(?!\$[a-z]+\$).之前。
(?s)標誌將導致.匹配，包括新的生產線的任何字符。這將允許標籤包含跨越多行的文本。

來源

2013-05-14 18:45:06 nhahtdh

它可以有一到多個小寫英文字符 – Phoenix

你能解釋一下嗎？ – Phoenix

這個工程！你真了不起！ – Phoenix

您只需更換所有的發生的「（即[az]）」與</centamp> $ 1 < centamp>，然後在前面加上< centamp>並追加</centamp>

String content = "Test (a) test (b) (c)"; 
Pattern pattern = Pattern.compile("(\\([a-z]\\))"); 
Matcher matcher = pattern.matcher(content); 
String result = "&lt;centamp&gt;" + matcher.replaceAll("&lt;/centamp&gt;$1&lt;centamp&gt;") + "&lt;/centamp&gt;";

note我在瀏覽器中編寫了上述內容，因此可能會出現語法錯誤。

編輯下面是一個完整的例子，最簡單的RegEx可能。

import java.util.*; 
import java.lang.*; 
import java.util.regex.*; 
class Main 
{ 
    public static void main (String[] args) throws java.lang.Exception 
    { 
     String content = "test (a) (b) and (c)"; 
     String result = "<centamp>" + 
      content.replaceAll("(\\([a-z]\\))", "</centamp>$1<centamp>") + 
      "</centamp>"; 
     result = result.replaceAll("<centamp></centamp>", ""); 
     System.out.print(result); 
    } 
}

來源

2013-05-14 18:55:52

我不認爲這是正確的..因爲它不適用於「（一）」 – Phoenix

@Phoenix - 如果通過「不工作」你的意思是有額外的標記，那麼是的這是真的，你可以刪除不必要的標記與另一個replaceAll。 –

這是另一種使用清潔正則表達式的解決方案。解決方案更長，但它可以更靈活地調整條件以添加標籤。

這裏的想法是匹配包含小寫字母的圓括號（我們不想標記的部分），然後使用匹配中的索引來標識我們想要包含在標記中的部分。

// Regex for the parenthesis containing only lowercase English 
// alphabet characters 
static Pattern REGEX_IN_PARENTHESIS = Pattern.compile("\\([a-z]+\\)"); 

private static String addTag(String str) { 
    Matcher matcher = REGEX_IN_PARENTHESIS.matcher(str); 
    StringBuilder sb = new StringBuilder(); 

    // Index that we have processed up to last append into StringBuilder 
    int lastAppend = 0; 

    while (matcher.find()) { 
     String bracket = matcher.group(); 

     // The string from lastAppend to start of a match is the part 
     // we want to tag 
     // If you want to, you can easily add extra logic to process 
     // the string 
     if (lastAppend < matcher.start()) { // will not tag if empty string 
      sb.append("<centamp>") 
       .append(str, lastAppend, matcher.start()) 
       .append("</centamp>"); 
     } 

     // Append the parenthesis with lowercase English alphabet as it is 
     sb.append(bracket); 

     lastAppend = matcher.end(); 
    } 

    // The string from lastAppend to end of string (no more match) 
    // is the part we want to tag 
    if (lastAppend < str.length()) { 
     sb.append("<centamp>") 
      .append(str, lastAppend, str.length()) 
      .append("</centamp>"); 
    } 

    return sb.toString(); 
}

來源

2013-05-14 19:55:39 nhahtdh

排除小寫括號上的標記字母

回答

相關問題