2010-02-09 53 views
0

在定期我必須做在web瀏覽器手動如下:下載文件(第二部分)

  1. 轉到一個HTTPS的網站。
  2. 登錄網絡表單。
  3. 點擊鏈接下載一個大文件(135MB)。

我想使用.NET自動化這個過程。

幾天前我在這裏發佈了這個question這裏。感謝Rubens Farias的一段代碼,我現在可以執行上述步驟1和2.在步驟2之後,我可以讀取包含要下載文件的URL的頁面的HTML(使用afterLoginPage = reader .ReadToEnd())。如果登錄被授予,該頁面纔會顯示,因此步驟2被驗證爲成功。

我現在的問題是如何執行第3步。我嘗試了一些東西,但無濟於事,儘管之前登錄成功,文件的訪問被拒絕。

要澄清的事情,我會發佈下面的代碼,當然沒有實際的登錄信息和網站。最後,LoginPage後的變量包含登錄後頁面的HTML,包含我想下載的文件的鏈接。這個鏈接也以https開頭。

Dim httpsSite As String = "https://www.test.test/user/login" 
' enter correct address 
Dim formPage As String = "" 
Dim afterLoginPage As String = "" 

' Get postback data and cookies 
Dim cookies As New CookieContainer() 
Dim getRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest) 
getRequest.CookieContainer = cookies 
getRequest.Method = "GET" 

Dim wp As WebProxy = New WebProxy("[our proxies IP address]", [our proxies port number]) 
wp.Credentials = CredentialCache.DefaultCredentials 
getRequest.Proxy = wp 

Dim form As HttpWebResponse = DirectCast(getRequest.GetResponse(), HttpWebResponse) 
Using response As New StreamReader(form.GetResponseStream(), Encoding.UTF8) 
    formPage = response.ReadToEnd() 
End Using 

Dim inputs As New Dictionary(Of String, String)() 
inputs.Add("form_build_id", "[some code I'd like to keep secret]") 
inputs.Add("form_id", "user_login") 
For Each input As Match In Regex.Matches(formPage, "<input.*?name=""(?<name>.*?)"".*?(?:value=""(?<value>.*?)"".*?)? />", RegexOptions.IgnoreCase Or RegexOptions.ECMAScript) 
    If input.Groups("name").Value <> "form_build_id" And _ 
     input.Groups("name").Value <> "form_id" Then 
     inputs.Add(input.Groups("name").Value, input.Groups("value").Value) 
    End If 
Next 

inputs("name") = "[our login name]" 
inputs("pass") = "[our login password]" 

Dim buffer As Byte() = Encoding.UTF8.GetBytes(_ 
[String].Join("&", _ 
Array.ConvertAll(Of KeyValuePair(Of String, String), String)(inputs.ToArray(), _ 
Function(item As KeyValuePair(Of String, String)) (item.Key & "=") + System.Web.HttpUtility.UrlEncode(item.Value)))) 

Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest) 
postRequest.CookieContainer = cookies 
postRequest.Method = "POST" 
postRequest.ContentType = "application/x-www-form-urlencoded" 
postRequest.Proxy = wp 

' send username/password 
Using stream As Stream = postRequest.GetRequestStream() 
    stream.Write(buffer, 0, buffer.Length) 
End Using 

' get response from login page 
Using reader As New StreamReader(postRequest.GetResponse().GetResponseStream(), Encoding.UTF8) 
    afterLoginPage = reader.ReadToEnd() 
End Using 

回答

3

正如我所說到的這個問題的意見,你只需要使用 DownloadFile方法:

using(WebClient client = new WebClient()) 
    client.DownloadFile(
     "http://www.google.com/", "google_homepage.html"); 

只需用文件地址替換 "http://www.google.com/"

對不起,你需要去與HttpWebRequest

string fileAddress = "http://www.google.com/"; 
HttpWebRequest client = (HttpWebRequest)WebRequest.Create(fileAddress)); 
client.CookieContainer = cookies; 
int read = 0; 
byte[] buffer = new byte[1024]; 
using(FileStream download = 
    new FileStream("google_homepage.html", FileMode.Create)) 
{ 
    Stream stream = client.GetResponse().GetResponseStream(); 
    while((read = stream.Read(buffer, 0, buffer.Length)) != 0) 
    { 
     download.Write(buffer, 0, read); 
    } 
} 
+0

我已將此代碼轉換爲VB.NET,並且出現協議違規錯誤:「無法使用此動詞類型發送內容主體。」 Dim Stream As Stream = client.GetRequestStream() – George 2010-02-09 10:46:46

+0

您應該直接獲取ResponseStream;如果你需要發送參數,使用查詢字符串 – 2010-02-09 11:58:05

+0

沒關係,我沒有正確的轉換代碼。它現在有效,再次感謝。 – George 2010-02-09 12:39:12

2

當您下載文件時,您是否傳遞了Cookie?

1

您需要保留由登錄表單發回給你會話/身份驗證cookie。基本上從身份驗證表單的響應中取回cookie,並在您執行步驟3時將其發回。

這是擴展Web客戶端的簡單方法,它應該給您提供比上述更簡單的代碼:

http://couldbedone.blogspot.com/2007/08/webclient-handling-cookies.html

剛:

  1. 創建此CookieAwareWebClient
  2. 後的實例登錄表單
  3. 下載文件
1

您也可以選擇自動化Internet-Explorer,而不是嘗試通過HTTPS發送Web請求。
Web automation with Powershell使用PowerShell解釋了這一點,但是當以COM對象的身份訪問Internet Explorer時,您也可以在C#中執行此操作。
如果你只需要一個文件,並且不需要擔心內存泄漏,這種方法就可以很好地工作。

相關問題