JAVA匹配組

-1

我正在用正則表達式構建一個簡單的twitter用戶提到查找器。JAVA匹配組

public static Set<String> getMentionedUsers(List<Tweet> tweets) { 
    Set<String> mentionedUsers = new TreeSet<>(); 
    String regex = "(?<=^|(?<=[^a-zA-Z0-9-_\\\\.]))@([A-Za-z][A-Za-z0-9_]+)"; 

    for(Tweet tweet : tweets){ 
     Matcher matcher = Pattern.compile(regex).matcher(tweet.getText().toLowerCase()); 
     if(matcher.find()) { 
      mentionedUsers.add(matcher.group(0)); 
     } 
    } 
    return mentionedUsers; 
}

而且它未能找到匹配，如果表達式爲文本例如結束「@格洛弗告訴我@GREG」它只返回「@格洛弗」。

來源

2016-11-14 DopeDod

您是否記住group（0）是您的正則表達式的完整匹配，而group（1）將會是您在正則表達式中定義的第一個捕獲組內的內容？ –

您必須通過單個tweet保持循環與matcher.find（），直到您沒有找到任何更多的匹配，您目前只檢查一次tweet。

（旁註：你應該編譯你的for循環，更好的將是編譯的方法之外的外模式）

public static Set<String> getMentionedUsers(List<Tweet> tweets) { 
    Set<String> mentionedUsers = new TreeSet<>(); 
    String regex = "(?<=^|(?<=[^a-zA-Z0-9-_\\\\.]))@([A-Za-z][A-Za-z0-9_]+)"; 

    Pattern p = Pattern.compile(regex); 
    for(Tweet tweet : tweets){ 
     Matcher matcher = p.matcher(tweet.getText().toLowerCase()); 
     while (matcher.find()) { 
      mentionedUsers.add(matcher.group(0)); 
     } 
    } 
    return mentionedUsers; 
}

來源

2016-11-14 10:54:21 Nevay

男士......我即將發佈相同的內容。 –

你加入matcher.group(0)您Set，看一看，以Java Docs

組零表示整個模式，所以表達式m.group（0）等同於m.group（）。

從1所述的捕獲組開始，見reference

組號碼

捕獲組通過計數它們的開口被編號括號從左到右。在表達式（（A）（B（C））），例如，存在四種這樣的基團：

1（（A）（B（C）））

2（A）

3（B（C））

4（C）

組零始終代表整個表達式。

捕獲組的名稱是這樣命名的，因爲在匹配期間，保存與匹配這個組的輸入序列的每個子序列。捕獲的子序列可以在後面的表達式中通過反向引用使用，並且也可以在匹配操作完成後從匹配器中檢索。

來源

2016-11-14 11:00:19

回答

相關問題