2016-02-04 71 views
7

我碰到一個網站,似乎很簡單,我是非常有信心,我會使用的HttpWebRequest能夠讀取其數據,並能夠做到的GET和POST請求來了。 GET請求工作正常。 POST請求也不會產生任何錯誤,但發佈的表單數據仍然不會影響返回的結果。發佈的表單數據具有根據日期過濾數據的字段,但無論發佈每個所需數據的事實如何,都不會過濾返回的數據。我添加了每個標題,表單數據並在請求中添加了Cookie。提交的表單數據沒有影響

的網頁的URL爲http://www.bseindia.com/corporates/Insider_Trading_new.aspx?expandable=0

這似乎是一個很普通的網站,但因爲它是一個aspx頁面,涉及到的ViewState和事件驗證因此該預期不是很容易。

我的第一個步驟是使用招網站的GET和POST來分析,這讓我感到吃驚,因爲提琴手沒有捕捉任何流量此URL。我曾嘗試查爾斯,但它本身並沒有捕獲這個網址。除此之外,這位Url Fiddler和Charles都在捕捉其他一切。我還想提一下,當我使用HttpWebRequest從控制檯應用程序調用Url時,Fiddler和Charles都捕獲了它,但它們沒有從Chrome,FireFox和Internet Explorer 11捕獲它。

因此,我分析了網絡活動FireFox中的開發人員工具,一切都可見,其中包括(標題,參數和Cookie)。在Chrome中沒有Cookie存在。當我通過創建HttpWebRequest來檢查cookie並獲得響應時,不存在cookie。所以,在這個網站上有些奇怪的事情。

我都不知怎麼設法創建一個簡單的函數來創建請求並得到響應。我做的是,我首先創建一個GET請求,並得到網站的字符串並從中提取視圖狀態,EventValidation等。我使用這個信息被用於第二個HttpWebRequest這是一個帖子。現在一切正常,我得到的迴應,但不是預期的。我想要兩個給定日期之間的記錄,並且我已經在表單數據中指定了這些日期,但POST請求仍然不會返回過濾的數據。我已經提到了我在下面創建的函數,我將非常感謝任何建議,爲什麼會發生這種情況以及如何處理此問題。要理解這一點已經成爲我的一個挑戰,因爲我不明白爲什麼這個簡單的網站沒有出現在小提琴手中。 (這使用JavaScript回發)

該代碼可能看起來很長和可怕,但它是非常簡單和直接。

Try 

     ' First GET Request to obtain Viewstate, Eventvalidation etc 
     Dim objRequest2 As Net.HttpWebRequest = DirectCast(HttpWebRequest.Create("http://www.bseindia.com/corporates/Insider_Trading_new.aspx?expandable=0"), HttpWebRequest) 
     objRequest2.Method = "GET" 
     objRequest2.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" 
     objRequest2.Headers.Add("Accept-Encoding", "gzip, deflate") 
     objRequest2.Headers.Add("Accept-Language", "en-GB,en-US;q=0.8,en;q=0.6,ur;q=0.4") 
     objRequest2.KeepAlive = True 
     objRequest2.ContentType = "application/x-www-form-urlencoded" 
     objRequest2.Host = "www.bseindia.com" 
     objRequest2.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36" 

     objRequest2.AutomaticDecompression = DecompressionMethods.Deflate Or DecompressionMethods.GZip 

     Dim LoginRes2 As Net.HttpWebResponse 
     Dim sr2 As IO.StreamReader 
     LoginRes2 = objRequest2.GetResponse() 

     sr2 = New IO.StreamReader(LoginRes2.GetResponseStream) 
     Dim getString As String = sr2.ReadToEnd() 
     Dim getCookieCollection = objRequest2.CookieContainer 

     ' get the page ViewState     
     Dim viewStateFlag As String = "id=""__VIEWSTATE"" value=""" 
     Dim i As Integer = getString.IndexOf(viewStateFlag) + viewStateFlag.Length 
     Dim j As Integer = getString.IndexOf("""", i) 
     Dim viewState As String = getString.Substring(i, j - i) 

     ' get page EventValidation     
     Dim eventValidationFlag As String = "id=""__EVENTVALIDATION"" value=""" 
     i = getString.IndexOf(eventValidationFlag) + eventValidationFlag.Length 
     j = getString.IndexOf("""", i) 
     Dim eventValidation As String = getString.Substring(i, j - i) 

     ' get page EventValidation     
     Dim viewstateGeneratorFlag As String = "id=""__VIEWSTATEGENERATOR"" value=""" 
     i = getString.IndexOf(viewstateGeneratorFlag) + viewstateGeneratorFlag.Length 
     j = getString.IndexOf("""", i) 
     Dim viewStateGenerator As String = getString.Substring(i, j - i) 

     viewState = System.Web.HttpUtility.UrlEncode(viewState) 
     eventValidation = System.Web.HttpUtility.UrlEncode(eventValidation) 

     Dim LoginRes As Net.HttpWebResponse 
     Dim sr As IO.StreamReader 
     Dim objRequest As Net.HttpWebRequest 

     ' Second POST request to post the form data along with cookies 
     objRequest = DirectCast(HttpWebRequest.Create("http://www.bseindia.com/corporates/Insider_Trading_new.aspx?expandable=0"), HttpWebRequest) 

     Dim formDataCollection As New NameValueCollection 

     formDataCollection.Add("__EVENTTARGET", "") 
     formDataCollection.Add("__EVENTARGUMENT", "") 
     formDataCollection.Add("__VIEWSTATE", viewState) 
     formDataCollection.Add("__VIEWSTATEGENERATOR", viewStateGenerator) 
     formDataCollection.Add("__EVENTVALIDATION", eventValidation) 
     formDataCollection.Add("fmdate", "20160104") 
     formDataCollection.Add("eddate", "20160204") 
     formDataCollection.Add("hidCurrentDate", "2016/02/04") 
     formDataCollection.Add("ctl00_ContentPlaceHolder1_hdnCode", "") 
     formDataCollection.Add("txtDate", "04/01/2016") 
     formDataCollection.Add("ddlCalMonthDiv3", "1") 
     formDataCollection.Add("ddlCalYearDiv3", "2016") 
     formDataCollection.Add("txtTodate", "04/02/2016") 
     formDataCollection.Add("ddlCalMonthDiv4", "2") 
     formDataCollection.Add("ddlCalYearDiv4", "2016") 
     formDataCollection.Add("Hidden1", "") 
     formDataCollection.Add("ctl00_ContentPlaceHolder1_GetQuote1_smartSearch", "Enter Security Name/Code/ID") 
     formDataCollection.Add("btnSubmit.x", "44") 
     formDataCollection.Add("btnSubmit.y", "2") 

     Dim strFormdata As String = formDataCollection.ToString() 
     Dim encoding As New ASCIIEncoding 
     Dim postBytes As Byte() = encoding.GetBytes(strFormdata) 

     objRequest.Method = "POST" 
     objRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" 
     objRequest.Headers.Add("Accept-Encoding", "gzip, deflate") 
     objRequest.Headers.Add("Accept-Language", "en-GB,en-US;q=0.8,en;q=0.6,ur;q=0.4") 
     objRequest.Headers.Add("Cache-Control", "private, max-age=60") 
     objRequest.KeepAlive = True 
     objRequest.ContentType = "application/x-www-form-urlencoded" 
     objRequest.Host = "www.bseindia.com" 
     objRequest.Headers.Add("Origin", "http://www.bseindia.com") 
     objRequest.Referer = "http://www.bseindia.com/corporates/Insider_Trading_new.aspx?expandable=0" 
     objRequest.Headers.Add("Upgrade-Insecure-Requests", "1") 
     objRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36" 

     objRequest.ContentType = "text/html; charset=utf-8" 
     objRequest.Date = "Thu, 04 Feb 2016 13:42:04 GMT" 
     objRequest.Headers.Add("Server", "Microsoft-IIS/8.0") 
     objRequest.Headers.Add("Vary", "Accept-Encoding") 
     objRequest.Headers.Add("X-AspNet-Version", "2.0.50727") 
     objRequest.Headers.Add("ASP.NET", "ASP.NET") 

     objRequest.AutomaticDecompression = DecompressionMethods.Deflate Or DecompressionMethods.GZip 

     Dim gaCookies As New CookieContainer() 

     Dim cookie1 As New Cookie("__asc", "f673f0d5152a823bc335f575d34") 
     cookie1.Domain = ".bseindia.com" 
     cookie1.Path = "/" 
     gaCookies.Add(cookie1) 

     Dim cookie2 As New Cookie("__auc", "f673f0d5152a823bc335f575d34") 
     cookie2.Domain = ".bseindia.com" 
     cookie2.Path = "/" 
     gaCookies.Add(cookie2) 

     Dim cookie3 As New Cookie("__utma", "253454874.280640365.1454519857.1454519865.1454519865.1") 
     cookie3.Domain = ".bseindia.com" 
     cookie3.Path = "/" 
     gaCookies.Add(cookie3) 

     Dim cookie4 As New Cookie("__utmb", "253454874.1.10.1454519865") 
     cookie4.Domain = ".bseindia.com" 
     cookie4.Path = "/" 
     gaCookies.Add(cookie4) 

     Dim cookie5 As New Cookie("__utmc", "253454874") 
     cookie5.Domain = ".bseindia.com" 
     cookie5.Path = "/" 
     gaCookies.Add(cookie5) 

     Dim cookie6 As New Cookie("__utmt", "1") 
     cookie6.Domain = ".bseindia.com" 
     cookie6.Path = "/" 
     gaCookies.Add(cookie6) 

     Dim cookie7 As New Cookie("__utmz", "253454874.1454519865.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)") 
     cookie7.Domain = ".bseindia.com" 
     cookie7.Path = "/" 
     gaCookies.Add(cookie7) 

     Dim cookie8 As New Cookie("_ga", "GA1.2.280640365.1454519857") 
     cookie8.Domain = ".bseindia.com" 
     cookie8.Path = "/" 
     gaCookies.Add(cookie8) 

     Dim cookie9 As New Cookie("_gat", "1") 
     cookie9.Domain = ".bseindia.com" 
     cookie9.Path = "/" 
     gaCookies.Add(cookie9) 

     Dim postStream As Stream = objRequest.GetRequestStream() 
     postStream.Write(postBytes, 0, postBytes.Length) 
     postStream.Flush() 
     postStream.Close() 

     LoginRes = objRequest.GetResponse() 
     sr = New IO.StreamReader(LoginRes.GetResponseStream) 

     ReadWebsite = sr.ReadToEnd() 

     sr.Close() 
     sr = Nothing 
     LoginRes.Close() 
     LoginRes = Nothing 
     objRequest = Nothing 
     Exit Function 

    Catch ex As Exception 
     ReadWebsite = Nothing 
    End Try 

注:(對於日期的原始形式的數據,而不視圖狀態和eventvalidation)

fmdate:20160130 eddate:20160205 hidCurrentDate:2016年2月5日 ctl00_ContentPlaceHolder1_hdnCode: txtDate:2016年4月1日 ddlCalMonthDiv3:1 ddlCalYearDiv3:2016 txtTodate:2016年4月2日 ddlCalMonthDiv4:2 ddlCalYearDiv4:2016 Hidden1: ctl00_ContentPlaceHolder1_Ge tQuote1_smartSearch:輸入安全名稱/代碼/ ID btnSubmit.x:55 btnSubmit.y:13

+1

如果提供評論,爲什麼問題被低估並投票結束,這將非常有幫助。我知道在這個特定主題上有不同的問題,但這種情況和情況是不同的。對我來說,這個論壇的目的是爲了解你和你周圍的人所不瞭解的事情。我已經明確地提到了我所寫的所有努力和代碼,所以如果沒有適當的研究,我也不會問任何問題。 –

+0

我會檢查:'formDataCollection.Add(「fmdate」,「20160104」)'和它下面的行。您使用的所有其他日期似乎都是不同的格式。 – Jeroen

+0

@Jeroen感謝您的意見。我使用的是我在檢查員發現的相同格式。請檢查我的更新評論。我已添加從Chrome複製的原始表單數據。 –

回答

2

你可以考慮在瀏覽器中運行的網站,並使用工具來控制瀏覽器,而不是直接發出GET/POST請求。這可能比您現在的方法更容易,也更健壯一些。

E.g. Selenium Web驅動程序http://www.seleniumhq.org/projects/webdriver/

您將加載頁面,設置表單字段的值(使用css樣式選擇器來查找相應的字段),然後單擊按鈕。你可以自動化所有這些並獲得頁面源代碼(不幸的是,我不認爲你可以在javascript運行之後獲得當前狀態的完整html,但是可能你可以使用api來獲取你需要的元素)。

API文檔:http://seleniumhq.github.io/selenium/docs/api/dotnet/

1

你確實應該包括從表單中的所有領域,包括隱藏的人,並存儲在cookie中的ASP會話標識符。這樣你完全模擬瀏覽器的請求並實現你的目標。要顯示您必須提交的內容 - http://pastebin.com/AsSABgU6