我正嘗試在Java中使用Daring Fireball Regular Expression for matching URLs,並且我找到了一個導致評估持續進行的URL。我修改了原來的正則表達式以使用Java語法。Java正則表達式運行速度非常慢
private final static String pattern =
"\\b" +
"(" + // Capture 1: entire matched URL
"(?:" +
"[a-z][\\w-]+:" + // URL protocol and colon
"(?:" +
"/{1,3}" + // 1-3 slashes
"|" + // or
"[a-z0-9%]" + // Single letter or digit or '%'
// (Trying not to match e.g. "URI::Escape")
")" +
"|" + // or
"www\\d{0,3}[.]" + // "www.", "www1.", "www2." … "www999."
"|" + // or
"[a-z0-9.\\-]+[.][a-z]{2,4}/" + // looks like domain name followed by a slash
")" +
"(?:" + // One or more:
"[^\\s()<>]+" + // Run of non-space, non-()<>
"|" + // or
"\\((?:[^\\s()<>]+|(?:\\([^\\s()<>]+\\)))*\\)" + // balanced parens, up to 2 levels
")+" +
"(?:" + // End with:
"\\((?:[^\\s()<>]+|(?:\\([^\\s()<>]+\\)))*\\)" + // balanced parens, up to 2 levels
"|" + // or
"[^\\s`!\\-()\\[\\]{};:'\".,<>?«»「」‘’]" + // not a space or one of these punct chars (updated to add a 'dash'
")" +
")";
// @see http://daringfireball.net/2010/07/improved_regex_for_matching_urls
private static final Pattern DARING_FIREBALL_PATTERN = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
如果我嘗試運行以下操作,則需要花費很長時間。我已經縮小到平衡的parens匹配(我認爲)。如果你改變了parens中的文字,它可以正常工作,但是在大約15個字符處,它開始以指數方式減速。
final Matcher matcher = pattern.matcher("https://goo.gl/a(something_really_long_in_balanced_parens)");
boolean found = matcher.find();
是否有改善這種正則表達式,這樣對線不會永遠走一條?我在JUnit測試課中有大約100個不同的URL,我需要這些URL繼續工作。
http://www.regular-expressions.info/catastrophic.html – 2013-04-25 23:36:10