2
有人可以幫助我找出如何使用HttpWebRequest登錄到頁面,然後刮一頁。正在使用的代碼並不僅僅是在登錄頁面上寫出標記,但無法登錄......正在嘗試登錄的網站是基於php的網站。如何使用HttpWebRequest登錄到網站
與像Wireshark的工具 // first, request the login form to get the viewstate value
HttpWebRequest webRequest = WebRequest.Create("loginPageUrl") as HttpWebRequest;
StreamReader responseReader = new StreamReader(
webRequest.GetResponse().GetResponseStream()
);
string responseData = responseReader.ReadToEnd();
responseReader.Close();
string postData = String.Format("Username={0}&Password={1}", "user", "pwd");
// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();
// now post to the login form
webRequest = WebRequest.Create("loginPostUrl") as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;
// write the form values into the request message
StreamWriter requestWriter = new StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();
// we don't need the contents of the response, just the cookie it issues
webRequest.GetResponse().Close();
// now we can send out cookie along with a request for the protected page
webRequest = WebRequest.Create("PageToScrapeUrl") as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();
Console.WriteLine(responseData);
Console.ReadKey();
很少有auth的頁面是**想要被刮掉的,並且經常違反ToS。更常見的情況是,如果這些數據的目的*是這樣使用的,將會有一個編程API。使用API。 –
對於這種情況,你被允許刮:你有檢查與提琴手的交通?您必須使用原始頁面分析瀏覽器的成功登錄並模擬網頁請求。也許有一些其他領域發佈到服務器? – Jan
你能給我們網站的網址嗎?因爲沒有登錄到網站的銀彈(有時網站本身也在改變它 - 當它被修改時),將會更容易看到你錯在哪裏。 –