解析輸入文本

我有這個對應的輸入文字：
解析輸入文本

Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre  II]] movie directed [[Source:NYTimes]]... 
    Clark visited the [[University of Pleasantville]] campus in November 2009 to ... 
    *[[1973]] &amp;ndash; [[Clark Kent]], superhero and newspaper reporter... 
    After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''... 
    Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]...</code>

這是我在Java中使用的模式代碼：

<code>String pattern = "(?:\\p{Punct}|\\B|\\b)(\\[\\[[^(Arch:|Zeus:|Source:)].*?\\]\\])(?:\\p{Punct}|\\b|\\B)"; 
    Pattern r = Pattern.compile(pattern); 
    Matcher m = r.matcher(data); 
     while (m.find()) { 
     System.out.println("Found value: " + m.group(1)); 
     }

我讀文件中的行通過使用BufferedReader的readLine（系統解析每行），並使用我的正則表達式獲得以下輸出：
Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre II]] movie directed [[Source:NYTimes]]... Clark visited the [[University of Pleasantville]] campus in November 2009 to ... Found value: [[University of Pleasantville]] *[[1973]] – [[Clark Kent]], superhero and newspaper reporter... Found value: [[1973]] After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''... Found value: [[negative hero]] Found value: [[Alternate Superman]] Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]... Found value: [[Daily Planet]] Found value: [[Louis Lane]]

正如您所看到的那樣：我無法提取花括號中的所有內容[[I_want_to_extract_these_except_Source_or_Arch_or_Zeus]]。例如：從第一行我應該已經提取[[超人（英雄）|超人]]等，但它沒有檢索任何東西。我如何修改我的正則表達式來提取除[[Source：something]]等之外的所有東西？謝謝。

來源

2014-07-06 Knight

整個文本追加到字符串，然後匹配 – nikolap

是，這個問題@nikolap？逐行閱讀有什麼不對？ – Knight

我不確定所有文字，但可能有類似[[Lois Lane和下一行關閉]] – nikolap

使用負前瞻（例如(?!...)）是這樣的：

\[\[(?!Arch:|Zeus:|Source).*?\]\]

看到它在行動：http://regex101.com/r/lJ6sH3/1

來源

2014-07-06 15:15:10

Thanks @mrhobo。這樣可行！ – Knight

解析輸入文本

回答

相關問題