2015-08-25 71 views
0

我寫了一個函數,搜索字符串中的給定的標籤,並刪除所有這些標籤及其內容,除了第一個:高效代碼:刪除字符串變量除了第一個

Sub Main() 
    Dim fileAsString = "<div>myFirstDiv</div>" + 
         "<Div></dIV>" + 
         "<city>NY</city>" + 
         "<city></city>" + 
         "<div></div>" + 
         "<span></span>" 

    ' Removes these tags and their content from fileAsString, except the 
    ' first appearance 

    Dim forbiddenNodeslist As New List(Of String) 
    forbiddenNodeslist.Add("div") 
    forbiddenNodeslist.Add("city") 

    ' Run all over the forbidden tags 

    For Each node In forbiddenNodeslist 

     Dim re = New Regex("<" + node + "[^>]*>(.*?)</" + node + ">", RegexOptions.IgnoreCase) 

     Dim matches = re.Matches(fileAsString) 

     Dim matchesCount = matches.Count - 1 

     ' Count the characters that were replaced by empty string, in order 
     ' to update the start index of the other matches 

     Dim removedCharacters = 0 

     ' Run all over the matches, except the first one 

     For index = 1 To matches.Count - 1 
      Dim match = matches(index) 

      ' set start index and length in order to replace it by empty string 

      Dim startIndex = match.Index - removedCharacters 
      Dim matchCharactersCount = match.Length 

      ' Update the number of characters that will be removed 

      removedCharacters = matchCharactersCount 

      ' Remove it from the string 

      fileAsString = fileAsString.Remove(startIndex, matchCharactersCount) 

     Next 


    Next 
end sub 

但效率不高的原因我搜索匹配(字符串的第一個循環),然後一次又一次地循環,以便用空字符串替換它。

我該如何提高效率?

任何幫助表示讚賞!

+0

是否有一個原因,你正在存儲removedCharacters和刪除標籤的位置?如果沒有,這只是額外的開銷。循環訪問有問題的標籤列表,刪除並使用單個語句刪除/替換所有的事件。 http://stackoverflow.com/questions/6025560/how-to-ignore-case-in-string-replace – mjw

+0

是的,我存儲它,因爲當我刪除一些字符串,下一場比賽的開始索引將需要更新。例如:「

」,第一個div出現在索引0,第二個在11,第三個在22. 當我刪除第二個div時,第三個div將位於索引11而不是22. –

+1

您可以反向整個字符串,然後刪除除LAST之外的所有字符,然後再次反轉以獲得相同的結果。 –

回答

2

所以我在C#中回答了這個問題。你可以找到我使用的小提琴here

public static void Main() 
{ 
    var fileAsString = "<div>myFirstDiv</div><Div></dIV><city>NY</city><city></city><div></div><span></span>"; 

    //Using pipe delimited, this will come in handy for our second regex 
    var delimetedForbiddenList = "div|city"; 

    //Use this regex to get everything that isn't the first tag 
    var allButFirstTagRegex = new Regex(@"^(<([a-z]+)>[^</]*</\2>)(.*)", RegexOptions.IgnoreCase); 
    var matches = allButFirstTagRegex.Matches(fileAsString); 


    //matches[0].Groups[1] = (<([a-z]+)>[^</]*</\2>) -- the complete first 
    //tag (open, close, and inner), we'll use this later 

    //matches[0].Groups[2] = ([a-zA-Z]+) --the first opening tag 
    //used to get a matching close tag 

    //matches[0].Groups[3] = (.*) -- everything not in the first tag   

    var allButFirstTag = matches[0].Groups[3].ToString(); 

    //allButFirstTag == @"<Div></dIV><city>NY</city><city></city><div></div><span></span>" 

    //the regex to remove our forbidden tags 
    var removeForbiddenPattern = String.Format("(<({0})>[^</]*</\\2>)", delimetedForbiddenList); 
    //removeForbiddenPattern == new Regex(@"(<(div|city)>[^</]*</\2>)"); 

    var resultsWithForbiddenRemoved = Regex.Replace(allButFirstTag, removeForbiddenPattern, String.Empty, RegexOptions.IgnoreCase); 
    //resultsWithForbiddenRemoved == @"<span></span>" 

    var finalResults = matches[0].Groups[1].ToString() + resultsWithForbiddenRemoved; 
    //finalResults = <div>myFirstDiv</div><span></span> 

} 
+1

對不起,關於這個語法突出顯示,無法讓它正常工作... – tsacodes

相關問題