2013-08-23 31 views
0

右鍵我從我從維基百科下載的xml文件中刪除一些引號。到目前爲止,該文本看起來像這樣(忽略換行,這只是所以它更容易閱讀):如何在正則表達式中替換更多表達式中的多個字符的字符串vb.net

'''Anarchism''' is a political philosophy that advocates stateless societies based on 
non-hierarchical free associations.<ref name="iaf-ifa.org"/><ref>"That is why 
Anarchy, when it works to destroy authority in all its aspects, when it demands 
the abrogation of laws and the abolition of the mechanism that serves to 
impose them, when it refuses all hierarchical organization and preaches free agreement - at the same time strives to maintain and enlarge the precious kernel of social customs without which 
no human or animal society can exist." Peter Kropotkin. http://www.theanarchistlibrary.org/HTML/Petr_Kropotkin__Anarchism__its_philosophy_and_ideal.html 
Anarchism: its philosophy and ideal</ref><ref>"anarchists are opposed to irrational (e.g., illegitimate) 
authority, in other words, hierarchy - hierarchy being the institutionalisation of authority 
within a society." http://www.theanarchistlibrary.org/HTML/The_Anarchist_FAQ_Editorial_Collective__An_Anarchist_FAQ__03_17_.html#toc2 "B.1 
Why are anarchists against authority and hierarchy?" in An 
Anarchist FAQ</ref><ref>"ANARCHISM, a social philosophy that rejects 
authoritarian government and maintains that voluntary institutions are best 
suited to express man's natural social tendencies." George Woodcock. "Anarchism" at The Encyclopedia of Philosophy</ref><ref>"In a society developed on these lines, the voluntary 
associations which already now begin to cover all the fields of human activity 
would take a still greater extension so as to substitute themselves for the 
state in all its functions." http://www.theanarchistlibrary.org/HTML/Petr_Kropotkin___Anarchism__from_the_Encyclopaedia_Britannica.html 
Peter Kropotkin. "Anarchism" from the Encyclopædia Britannica</ref> Anarchism holds the state 
to be undesirable, unnecessary, or harmful 

所有我從這個文本塊想是這樣的:

無政府主義是一種政治倡導基於非等級自由聯想的無國籍社會的哲學。無政府主義認爲國家是不受歡迎的,不必要的或有害的。

這在我看來,如果我刪除"<ref""/ref>"之間的所有文字,我應該能夠捕捉到所有需要的不良文字和刪除它。這是我目前的代碼:

 Dim temptext As String = newsrt.ToString 
     Dim expression As New Regex("(?<=\<ref)[^/ref>]+(?=/ref>)") 
     Dim resul As String = expression.Replace(temptext, "") 

但這似乎不起作用。 <ref/ref>之間沒有文字被捕獲並替換爲「」。

任何幫助或建議將是偉大的!謝謝。

回答

2

這並非如何否定字符類的工作。該類不允許任何單個字符/,r,e,f,>。此外,您甚至不想排除/ref>,因爲您也想要刪除所有中間值ref。您可以簡單地使用.*。此外,你不想看到周圍的東西,因爲它們排除了匹配內部匹配的東西。但你想要刪除這些標籤以及。因此,在你的情況下,它應該是這麼簡單:

"<ref.*/ref>" 

由於*是貪婪的,這場比賽將簡單地從第一<ref轉到最後/ref> - 貪婪的通常是一個大問題,但在特定的大小寫正好符合要求。

您可能想要使用RegexOptions.Singleline,以便.匹配換行符(如果有的話)。

+0

嘿。我對於正則表達式很新,但我想我明白什麼是貪婪 - >它會找到最後一節(/ ref>)的最後一個位置?如果是的話,我該如何阻止這種情況,因爲這裏有很多這些參考文獻,這些參考文獻是上下翻頁的,其中需要的文字介於兩者之間。 – FraserOfSmeg

+0

我明白了,添加一個?像這樣:「」。感謝您的幫助! :D – FraserOfSmeg

+0

@FraserOfSmeg在這種情況下,您可以使其不符合''或使用')。)*/ref>'(這是您最初的意圖)。或者,更好的辦法是使用XML解析器! –