正則表達式：去除所有標籤除包含關鍵字「大學」

[introduction][position]Lead Researcher and Research Manager[/position] in the [affiliation]Web Search and Mining Group, Microsoft Research[/affiliation]</b>. 

I am a [position]lead researcher[/position] at [affiliation]Microsoft Research[/affiliation]. I am also [position]adjunct professor[/position] of [affiliation]Peking University[/affiliation], [affiliation]Xian Jiaotong University[/affiliation] and [affiliation]Nankai University[/affiliation]. 

I joined [affiliation]Microsoft Research[/affiliation] in June 2001. Prior to that, I worked at the Research Laboratories of NEC Corporation. 

I obtained a [bsdegree]B.S.[/bsdegree] in [bsmajor]Electrical Engineering[/bsmajor] from [bsuniv]Kyoto University[/bsuniv] in [bsdate]1988[/bsdate] and a [msdegree]M.S.[/msdegree] in [msmajor]Computer Science[/msmajor] from [msuniv]Kyoto University[/msuniv] in [msdate]1990[/msdate]. I earned my [phddegree]Ph.D.[/phddegree] in [phdmajor]Computer Science[/phdmajor] from the [phduniv]University of Tokyo[/phduniv] in [phddate]1998[/phddate]. 

I am interested in [interests]statistical learning[/interests], [interests]natural language processing[/interests], [interests]data mining, and information retrieval[/interests].[/introduction]

我能夠剝離以上從段落的所有標籤的：正則表達式：去除所有標籤除包含關鍵字「大學」

String stripped = html.replaceAll("\\[.*?\\]", "");

但我想保持3對標籤在段落中，它們是[bsuniv][/bsuniv],[msuniv][/msuniv]和[phduniv][/phduniv]。換句話說，我不想剝奪包含關鍵字「univ」的標籤。我找不到一個方便的方法來重寫正則表達式。任何人都幫助我？

來源

2012-12-17 Terry Li

你可以在這裏使用一個negative-look ahead斷言： -

str = str.replaceAll("\\[(.(?!univ))*?\\]", "");

或： -

str = str.replaceAll("\\[((?!univ).)*?\\]", "");

他們都將會給你所需的輸出。只有一個區別 -

第一個做了負前瞻，針對當前字符，如果後面沒有univ，它移動到下一個字符。
第二個對每個字符前面的空字符串進行否定預檢，如果它沒有跟在univ之後，它會繼續匹配單個字符。

來源

2012-12-17 05:41:40

它的工作原理！非常感謝你。 –

@TerryLi ..不客氣:) –

正則表達式：去除所有標籤除包含關鍵字「大學」

回答

相關問題