下載整個網站在C＃

我的無知，我使用

string p="http://" + Textbox2.text; 
string r= textBox3.Text; 
System.Net.WebClient webclient=new 
System.Net.Webclient(); 
webclient.DownloadFile(p,r);

下載的網頁。你能幫我加強代碼，以便下載整個網站。嘗試使用HTML Screen Scraping，但它只返回index.html文件的href鏈接。我如何着手未來的日子

感謝

來源

2010-01-19 Karthik

您是否解決了問題？ – Jason 2010-01-22 21:19:46

刮網站實際上是一個大量的工作，有很多的極端情況。

改爲調用wget。 manual解釋瞭如何使用「recursive retrieval」選項。

來源

2010-01-19 07:21:50 Will

protected string GetWebString(string url) 
    { 
     string appURL = url; 
     HttpWebRequest wrWebRequest = WebRequest.Create(appURL) as HttpWebRequest; 
     HttpWebResponse hwrWebResponse = (HttpWebResponse)wrWebRequest.GetResponse(); 

     StreamReader srResponseReader = new StreamReader(hwrWebResponse.GetResponseStream()); 
     string strResponseData = srResponseReader.ReadToEnd(); 
     srResponseReader.Close(); 
     return strResponseData; 
    }

這會將網頁從提供的URL中放入字符串中。

然後，您可以使用REGEX來解析字符串。

這個小塊從craigslist中獲取特定鏈接並將它們添加到arraylist中...修改爲您的目的。

protected ArrayList GetListings(int pages) 
    { 
      ArrayList list = new ArrayList(); 
      string page = GetWebString("http://albany.craigslist.org/bik/"); 

      MatchCollection listingMatches = Regex.Matches(page, "(<p><a href=\")(?<LINK>/.+/.+[.]html)(\">)(?<TITLE>.*)(-</a>)"); 
      foreach (Match m in listingMatches) 
      { 
       list.Add("http://albany.craigslist.org" + m.Groups["LINK"].Value.ToString()); 
      } 
      return list; 
    }

來源

2010-01-19 08:38:29 Jason

+1，還記得解析所有的文本文件（html，css），因爲它們可以鏈接到其他資源 – 2010-01-19 08:41:53

下載整個網站在C＃

回答

相關問題