,因爲它看起來,基本上是三種類型的代碼格式:斜體,大膽,並LINK
我會做3遍正則表達式的替代品。
和優先順序根據你給應該是輸入:
/**
* FIRST REMOVE ITALICS, THEN BOLD, THEN URL
*/
public static String cleanWikiFormat(CharSequence sequence) {
return Test.removeUrl(Test.removeBold(Test.removeItalic(sequence)));
}
下面是一個示例代碼:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
private static String removeItalic(CharSequence sequence) {
Pattern patt = Pattern.compile("_\\*(.+?)\\*_");
Matcher m = patt.matcher(sequence);
StringBuffer sb = new StringBuffer(sequence.length());
while (m.find()) {
String text = m.group(1);
// ... possibly process 'text' ...
m.appendReplacement(sb, Matcher.quoteReplacement(text));
}
m.appendTail(sb);
return sb.toString();
}
private static String removeBold(CharSequence sequence) {
Pattern patt = Pattern.compile("\\*(.+?)\\*");
Matcher m = patt.matcher(sequence);
StringBuffer sb = new StringBuffer(sequence.length());
while (m.find()) {
String text = m.group(1);
// ... possibly process 'text' ...
m.appendReplacement(sb, Matcher.quoteReplacement(text));
}
m.appendTail(sb);
return sb.toString();
}
private static String removeUrl(CharSequence sequence) {
Pattern patt = Pattern.compile("\\[(.+?)\\|\\]");
Matcher m = patt.matcher(sequence);
StringBuffer sb = new StringBuffer(sequence.length());
while (m.find()) {
String text = m.group(1);
// ... possibly process 'text' ...
m.appendReplacement(sb, Matcher.quoteReplacement(text));
}
m.appendTail(sb);
return sb.toString();
}
public static String cleanWikiFormat(CharSequence sequence) {
return Test.removeUrl(Test.removeBold(Test.removeItalic(sequence)));
}
public static void main(String[] args) {
String text = "[hello|] this is just a *[test|]* to clean wiki *type* and _*formatting*_";
System.out.println("Original");
System.out.println(text);
text = Test.cleanWikiFormat(text);
System.out.println("CHANGED");
System.out.println(text);
}
}
下面將爲:
Original
[hello|] this is just a *[test|]* to clean wiki *type* and _*formatting*_
CHANGED
hello this is just a test to clean wiki type and formatting
還,如果我有像\ _ \ * \ [link | \] \ * \ _之類的東西,一個粗體和斜體的鏈接(沒有url部分),我需要解析它3次,一次刪除斜體,其他刪除粗體,最後一個刪除括號......這對我所需要的太慢了 – user1739166