爲什麼此代碼執行速度比其他代碼快89％以查找HTML元素？有什麼不同？

想象以下代碼：爲什麼此代碼執行速度比其他代碼快89％以查找HTML元素？有什麼不同？

var htmlDoc = new HtmlAgilityPack.HtmlDocument(); 
htmlDoc.OptionAutoCloseOnEnd = false; 
htmlDoc.OptionCheckSyntax = false; 
htmlDoc.OptionFixNestedTags = false; 
htmlDoc.OptionOutputOptimizeAttributeValues = false; 
htmlDoc.LoadHtml(html); /*Where html is a string of 5MB size.*/ 

/*First approach to select all "anchor" elements*/ 
HtmlNodeCollection coll = htmlDoc.DocumentNode.SelectNodes("//*/a"); 
if (coll != null && coll.Count > 0) 
    ReplaceSourceLinks(coll, "href");

上面的代碼應該加載〜5MB HTML字符串和替換與適合於在App東西HTML中找到的所有那些9567個錨HREF。上面的代碼需要1998ms執行。

所以我決定更換而不是使用XPath來解決這些美女主播與下面的代碼上面顯示的最後3條線，也即，我決定用下面的代碼：

IEnumerable<HtmlNode> coll = htmlDoc.DocumentNode.Descendants("a"); 
if (coll != null) 
    ReplaceSourceLinks(coll, "href");

新方法利用只有220ms執行！比第一種方法快了近89％。我只想知道這些代碼是否相同。他們是否處理同一組錨？（順便說一句，第二個也選擇了相同的9567個元素）。爲什麼第二種方法執行速度快89％？

謝謝。

來源

2015-09-19 Joe Bank

您是否清除了兩次測試之間的緩存？ – Steve

當然可以。 –

當你看着它的source code你會發現SelectNodes方法做了很多工作havier像評估XPath並找到節點：

public HtmlNodeCollection SelectNodes(string xpath) 
{ 
    HtmlNodeCollection list = new HtmlNodeCollection(null); 

    HtmlNodeNavigator nav = new HtmlNodeNavigator(_ownerdocument, this); 
    XPathNodeIterator it = nav.Select(xpath); 
    while (it.MoveNext()) 
    { 
     HtmlNodeNavigator n = (HtmlNodeNavigator) it.Current; 
     list.Add(n.CurrentNode); 
    } 
    if (list.Count == 0) 
    { 
     return null; 
    } 
    return list; 
}

而Descendants方法只是循環在緩存ChildNodes和檢查元素的名字：在上面的調用中使用

其他輔助方法：

/// <summary> 
/// Gets all Descendant nodes for this node and each of child nodes 
/// </summary> 
/// <returns></returns> 
public IEnumerable<HtmlNode> DescendantNodes() 
{ 
    foreach (HtmlNode node in ChildNodes) 
    { 
     yield return node; 
     foreach (HtmlNode descendant in node.DescendantNodes()) 
      yield return descendant; 
    } 
} 


/// <summary> 
/// Gets all Descendant nodes in enumerated list 
/// </summary> 
/// <returns></returns> 
public IEnumerable<HtmlNode> Descendants() 
{ 
    foreach (HtmlNode node in DescendantNodes()) 
    { 
     yield return node; 
    } 
}

來源

2015-09-19 08:30:10 t3chb0t

一個區別是解析XPath表達式的代價，但我不認爲會導致這種差異。從@ t3chb0t給出的源代碼看來，主要區別在於XPath解決方案在內存中構建了一個列表，而直接方法返回了一個迭代器。你沒有說選擇了多少元素，但是構建一個列表會付出代價：這似乎是設計不佳的API的一個相當不可避免的後果。

來源

2015-09-19 09:14:04

爲什麼此代碼執行速度比其他代碼快89％以查找HTML元素？有什麼不同？

回答

相關問題