在指定的URL下載內容

-1

我只想從網站下載內容。最好的方法是什麼？我試過WebClient，但使用我也得到所有的標籤。我只想內容..在指定的URL下載內容

以下是我的代碼：

WebClient w = new WebClient(); 

//Using DownloadString 
string s = w.DownloadString("http://en.wikipedia.org/wiki/Main_Page"); 
Console.WriteLine(s); 

//Using DownloadData 
byte[] downloadedData = w.DownloadData("http://en.wikipedia.org/wiki/Main_Page"); 
string data = Encoding.ASCII.GetString(downloadedData); 
Console.WriteLine(data);

有什麼建議？

來源

2014-09-13 user3771772

你真正想要的是一個刮板嗎？ – 2014-09-13 00:55:47

使用DOM解析器。也許就像HTMLAgilityPack一樣。 – David 2014-09-13 01:00:23

我想你想剝離下載的html並解析url的內容？

對於這樣的目的，我有一個靜態類（在計算器中）：

public static class StringExtensions 
{ 
    public static string StripHTML(this string htmlString) 
    { 
     if (string.IsNullOrEmpty(htmlString)) return htmlString; 

     string pattern = @"<(.|\n)*?>"; 

     string s = Regex.Replace(htmlString, pattern, string.Empty); 

     return s; 
    } 
}

而且你可以用它這樣的：

string s = SomeDownloadFunction("http://en.wikipedia.org/wiki/Main_Page"); 
string content = s.StripHTML();

來源

2014-09-13 00:55:46

+1因爲它有效。和方便的擴展功能。 – 2014-09-13 00:56:15

謝謝......這很有幫助.. – user3771772 2014-09-14 14:42:55

在去除標籤可以用正則表達式可以輕鬆實現，如果你想要的是檢索頁面上的所有實際內容（忽略廣告，導航欄等），這是一項非常艱鉅的任務。幸運的是，一些非常聰明的人很好地分享他們在這方面的研究。看看boilerpipe（演示here）。

來源

2014-09-13 01:08:49 Mephy

在指定的URL下載內容

回答

相關問題