7

我試圖抓取具有用戶身份驗證的網站。我可以通過POST發送我的登錄信息並存儲cookie。但是,登錄後,嘗試訪問受保護的頁面時出現403錯誤。powershell httpwebrequest GET方法cookiecontainer問題?

$url = "https://some_url" 

$CookieContainer = New-Object System.Net.CookieContainer 

$postData = "User=UserName&Password=Pass" 

$buffer = [text.encoding]::ascii.getbytes($postData) 

[net.httpWebRequest] $req = [net.webRequest]::create($url) 
$req.method = "POST" 
$req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" 
$req.Headers.Add("Accept-Language: en-US") 
$req.Headers.Add("Accept-Encoding: gzip,deflate") 
$req.Headers.Add("Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7") 
$req.AllowAutoRedirect = $false 
$req.ContentType = "application/x-www-form-urlencoded" 
$req.ContentLength = $buffer.length 
$req.TimeOut = 50000 
$req.KeepAlive = $true 
$req.Headers.Add("Keep-Alive: 300"); 
$req.CookieContainer = $CookieContainer 
$reqst = $req.getRequestStream() 
$reqst.write($buffer, 0, $buffer.length) 
$reqst.flush() 
$reqst.close() 
[net.httpWebResponse] $res = $req.getResponse() 
$resst = $res.getResponseStream() 
$sr = new-object IO.StreamReader($resst) 
$result = $sr.ReadToEnd() 
$res.close() 



$url2 = "https://some_url/protected_page" 

[net.httpWebRequest] $req2 = [net.webRequest]::create($url2) 
$req2.Method = "GET" 
$req2.Accept = "text/html" 
$req2.AllowAutoRedirect = $false 
$req2.CookieContainer = $CookieContainer 
$req2.TimeOut = 50000 
[net.httpWebResponse] $res2 = $req2.getResponse() 
$resst = $res2.getResponseStream() 
$sr = new-object IO.StreamReader($resst) 
$result = $sr.ReadToEnd() 

解決方法:所以嘗試幾乎所有的東西我結束了嘗試不同的東西和它的實際工作後。

在發佈登錄信息並獲取會話cookie後,我使用webclient通過將cookie字符串添加到標題來訪問安全頁面。

$web = new-object net.webclient 
$web.Headers.add("Cookie", $res.Headers["Set-Cookie"]) 
$result = $web.DownloadString("https://secure_url") 

這個很酷的事情之一是webclient保存cookie。要訪問另一個安全頁面,您可以調用$ web.downloadstring(「https:// another_secure_url」):)

+0

你可以發佈你的完整的解決方案。我處於同樣的情況,但我似乎還沒有完成這項工作。 – bearrito 2011-10-04 02:50:48

+0

我使用Fiddler2來捕獲瀏覽器和服務器之間的流量,然後從Fiddler2的請求頭中抓取cookie。我將該cookie添加到請求中,並且現在DownloadString不會不斷地重定向到登錄頁面。謝謝! – 2011-12-21 22:55:43

回答

4

我發現由於cookie可以附加附加信息(如URL或HTTP),$ res.Headers [「Set-Cookie」]對我無效。但是,使用$的CookieContainer變量,你可以很容易地將其更改爲使用GetCookieHeader(URL),這將去掉多餘的信息,並留下一個格式正確的cookie字符串:

$web = new-object net.webclient 
$web.Headers.add("Cookie", $CookieContainer.GetCookieHeader($url)) 
$result = $web.DownloadString($url) 
0

我會使用IE automation。有了這個,不需要使用cookie,頭文件等。更容易。

+0

在此之前,我給了ie自動化一個嘗試,但它只是太慢而無法刮除。但我確實找到了解決我的問題的方法。 – foureight84 2011-03-31 08:57:47

3

人們一直在尋求完整應用程序,這裏你有它

$url = "https://some_url" 

$CookieContainer = New-Object System.Net.CookieContainer 

$postData = "User=UserName&Password=Pass" 

$buffer = [text.encoding]::ascii.getbytes($postData) 

[net.httpWebRequest] $req = [net.webRequest]::create($url) 
$req.method = "POST" 
$req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" 
$req.Headers.Add("Accept-Language: en-US") 
$req.Headers.Add("Accept-Encoding: gzip,deflate") 
$req.Headers.Add("Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7") 
$req.AllowAutoRedirect = $false 
$req.ContentType = "application/x-www-form-urlencoded" 
$req.ContentLength = $buffer.length 
$req.TimeOut = 50000 
$req.KeepAlive = $true 
$req.Headers.Add("Keep-Alive: 300"); 
$req.CookieContainer = $CookieContainer 
$reqst = $req.getRequestStream() 
$reqst.write($buffer, 0, $buffer.length) 
$reqst.flush() 
$reqst.close() 
[net.httpWebResponse] $res = $req.getResponse() 
$resst = $res.getResponseStream() 
$sr = new-object IO.StreamReader($resst) 
$result = $sr.ReadToEnd() 
$res.close() 


$web = new-object net.webclient 
$web.Headers.add("Cookie", $res.Headers["Set-Cookie"]) 
$result = $web.DownloadString("https://secure_url")