matchcollection timeout

我在解析html時使用了matchcollection。但是這個解決方案需要很長時間，並且有時會失敗。我想如果我設置matchcollection超時這個麻煩將解決。我如何設置matchcollection的超時時間？（框架4.0）matchcollection timeout

anchorPattern[0]="<div.*?class=\"news\">.*?<div.*?class=\".*?date.*?\">(?<date>.*?)?</div>.*?<a.*?href=\"(?<link>.*?)\".*?>(?<title>.*?)?</a>.*?<(span.*?class=\".*?desc.*?\">(?<spot>.*?)?</span>)?" 
    MatchCollection mIcerik = Regex.Matches(html, anchorPattern[i], RegexOptions.Compiled); 
    if (mIcerik.Count > 0) 
      ListDegree.Add(i,mIcerik.Count);

來源

2012-11-16 RockOnGom

你知道Stack Overflow中最有回報的答案建議避免使用Regex作爲HTML的解析工具嗎？ http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Steve

我聽說過[HTML敏捷包]（http://htmlagilitypack.codeplex.com /）是用於.NET的HTML/DOM解析器。 –

是的，我知道。但是html源代碼不能正確，那麼htmlparser最好不是。例如，有時候html文本中沒有關閉標籤。所以我更喜歡使用正則表達式。 – RockOnGom

你的正則表達式中有太多的".*?"和可能的可能組合的數量要接近，爲你的一些輸入「無限」。嘗試使用原子組"(?>.*?)"，以自動丟棄組內任何標記記住的所有回溯位置。這至少會使所有正則表達式分析花費有限的時間。

來源

2013-02-08 19:42:30

TimeSpan timeout = new TimeSpan(0, 1, 0); 

anchorPattern[0]="<div.*?class=\"news\">.*?<div.*?class=\".*?date.*?\">(?<date>.*?)?</div>.*?<a.*?href=\"(?<link>.*?)\".*?>(?<title>.*?)?</a>.*?<(span.*?class=\".*?desc.*?\">(?<spot>.*?)?</span>)?" 

MatchCollection mIcerik = Regex.Matches(html, anchorPattern[i], RegexOptions.Compiled,timeout); 


if (mIcerik.Count > 0) 
     ListDegree.Add(i,mIcerik.Count);

Timespan參數建立一個超時間隔以匹配所有對象。或者您可以使用Regex.InfiniteMatchTimeout來指示該方法不應超時。 MSDN regex.Matches()

來源

2017-08-01 02:23:09 waloar

matchcollection timeout

回答

相關問題