C＃在網站中搜索字符串

我想弄清楚如果在C＃中如果我已經將網頁內容轉換爲字符串，搜索擴展的最佳方式是什麼。我只是想在網頁中提取以.html或.xhtml或edu結尾的網址。其中我不在乎開始的樣子，EndWith或Regex找到這個更好。C＃在網站中搜索字符串

所以如果我的輸入看起來像這樣

字符串str = {風險A，B = window.location.href.match（// webhp \？[^＃] 調= [^＃]/）;如果（A = b & & b.length個> 0 「http://www.google.com/logos/2011/lespaul.html」 + b [

，我想拉出http://www.google.com/logos/2011/lespaul.html商店？排列成陣列

來源

2011-10-13 user990951

我能想出這個表達式：http:\/\/(.*?)(.html|.xhtml|.edu)
編輯感謝@Kakashi http:\/\/.*?\.(?:x?html|edu)

來源

2011-10-13 20:54:06 Srinivas

你在你的正則表達式中創建不必要的組。 'http：\/\ /.*？\。（?: x？html | edu）' – Kakashi

好吧，我得到了這個工作..這裏是另一個問題，你somthin像.php？wsdl你會怎麼得到那成爲正則表達式。我認爲這是簡單的http：\/\ /（。*？）（。html | .xhtml | .edu | .php \？wsdl） – user990951

@ user990951我沒有得到這個問題，也許你可以更好地解釋它。我也很樂意爲此提供幫助。 – Srinivas

您應該使用HTML解析器，例如sharp-query或HTML Agility Pack和never use regular expressions for parsing html或者作爲這篇文章的作者說有些事情可能會發生。

來源

2011-10-13 20:49:14

如果你是剛剛匹配/提取的URL的正則表達式應該罰款。我相信重點在於解析HTML超越了regEx。 – RBZ

不，即使是解析url也應該避免使用正則表達式。你應該使用URL解析器。它們甚至內置於.NET框架中。 –

什麼使表達式正常？ – RBZ

試試這個：

var input = "string str = {var a,b=window.location.href.match(//webhp\\?[^#]tune=[^#]/);if(a=b&&b.length>0?\"http://www.google.com/logos/2011/lespaul.html"; 
var match = Regex.Match(input, @"https?:\/{2}[^\n]+\.(?:x?html|edu)"); 
Console.Write(match.Success? match.Groups[0].Value : "Not found"); //http://www.google.com/logos/2011/lespaul.html

來源

2011-10-13 21:11:17 Kakashi

C＃在網站中搜索字符串

回答

相關問題