如何用正則表達式替換部分字符串

我不是正則表達式的初學者，但是它們在perl中的使用看起來與在Java中有所不同。如何用正則表達式替換部分字符串

無論如何，我基本上有一個速記詞和他們的定義字典。我想遍歷字典中的單詞並用它們的含義替換它們。在JAVA中這樣做的最好方法是什麼？

我看過String.replaceAll（），String.replace（），以及Pattern/Matcher類。我希望做的線沿線的一個不區分大小寫的替代：

word =~ s/\s?\Q$short_word\E\s?/ \Q$short_def\E /sig

雖然我在這，你認爲這是最好的提取所有字符串中的單詞，然後申請我的字典或只是將字典應用於字符串？我知道我需要小心，因爲速記詞可以與其他速記含義的部分相匹配。

希望這一切都有道理。

謝謝。

澄清：

解釋是這樣的：笑：笑出聲來，ROFL：在地上打滾大笑，LL：像檸檬

字符串是：笑，我是ROFL

替換文字：笑出聲來，我在地板上笑嘻嘻笑

注意怎麼沒有添加到任何地方

來源

2010-09-24 ekawas

澄清：你的意思是你想要迭代字符串中的單詞並用它的定義替換短字？例如，用一長串文本代替「例如，替換」與「例如免費替換」。如果不是，請提供前後示例。 – 2010-09-24 14:17:02

我更新了我的問題。這個例子是在底部 – ekawas 2010-09-24 15:06:04

危險是正常詞語中的誤報。「fall」！=「felikes檸檬」

一種方法是將空白的詞拆分（做多個空格需要保留？）然後循環執行'if contains（）{replace} else {output原創}的想法。

我的輸出類將是一個StringBuffer

StringBuffer outputBuffer = new StringBuffer(); 
for(String s: split(inputText)) { 
    outputBuffer.append( dictionary.contains(s) ? dictionary.get(s) : s); 
    }

讓你的分割方法很聰明，返回字分隔符也：

split("now is the time") -> now,<space>,is,<space>,the,<space><space>,time

那麼你不必擔心保護空白 - 上面的循環只會將不是字典單詞的任何內容追加到StringBuffer中。

以下是retaining delimiters when regexing上最近的SO線程。

來源

2010-09-24 15:38:17

的第一件事情，是進入我的腦海裏是這樣的：

... 
// eg: lol -> laugh out loud 
Map<String, String> dictionatry; 

ArrayList<String> originalText; 
ArrayList<String> replacedText; 

for(String string : originalText) { 
    if(dictionary.contains(string)) { 
     replacedText.add(dictionary.get(string)); 
    } else { 
     replacedText.add(string); 
    } 
...

或者你可以使用一個StringBuffer來代替replacedText的。

來源

2010-09-24 15:09:29

你是否暗示我爆炸我的原始文本？另外，這裏似乎有很多開銷？你認爲爆炸文本和保持這些數組比使用正則表達式更好（高效）嗎？ – ekawas 2010-09-24 15:17:29

在Java中，String類是不可變的，所以一旦創建並初始化，就不能在同一個引用上進行更改。所以每個替換調用都會創建一個新的String。我建議這個實現的另一個原因是因爲它很容易閱讀和理解。你只需將你的大字符串分解成一個列表並將這2個列表保存在內存中。 – 2010-09-24 15:31:41

謝謝。我喜歡你的答案，但我用另一個。 – ekawas 2010-09-24 16:10:00

如果你堅持使用正則表達式，這會工作（以佐爾坦·巴拉茲字典映射方法）：

Map<String, String> substitutions = loadDictionaryFromSomewhere(); 
int lengthOfShortestKeyInMap = 3; //Calculate 
int lengthOfLongestKeyInMap = 3; //Calculate 

StringBuffer output = new StringBuffer(input.length()); 
Pattern pattern = Pattern.compile("\\b(\\w{" + lengthOfShortestKeyInMap + "," + lengthOfLongestKeyInMap + "})\\b"); 
Matcher matcher = pattern.matcher(input); 
while (matcher.find()) { 
    String candidate = matcher.group(1); 
    String substitute = substitutions.get(candidate); 
    if (substitute == null) 
     substitute = candidate; // no match, use original 
    matcher.appendReplacement(output, Matcher.quoteReplacement(substitute)); 
} 
matcher.appendTail(output); 
// output now contains the text with substituted words

如果您打算處理許多輸入，預編譯模式比使用String.split()更有效，它編譯一個新的Pattern每個呼叫。

（編輯）編譯所有的鑰匙到一個單一的模式產生一個更有效的方法，就像這樣：

Pattern pattern = Pattern.compile("\\b(lol|rtfm|rofl|wtf)\\b"); 
// rest of the method unchanged, don't need the shortest/longest key stuff

這使得正則表達式引擎跳過這事發生在足夠短，但AREN任何言語在列表中，節省了大量的地圖訪問。

來源

2010-09-24 15:42:10 Barend

我不認爲|'我的字典中的每個關鍵字都是一個好方法，因爲在插入我的定義之前，我需要檢查關鍵字是什麼。 – ekawas 2010-09-24 16:08:48

這是檢查隱式在'substitute = substitutions.get（candidate）'中。 – Barend 2010-09-24 19:05:25

如何用正則表達式替換部分字符串

回答

相關問題