我試圖抓取一個用PHP編寫的網站以從特定表中提取某些信息。這是場景。在窗體上使用Httpwebrequest獲取報廢表時發生意外的行爲
在着陸頁上有一個表單,可以從用戶接收查詢並根據搜索結果進行查詢。如果我忽略這些字段並點擊「提交」,它將產生整個結果(這是我感興趣的)。在我不知道HTTPWebRequest類之前,我只是簡單地將URL傳遞給HtmlAgilityPack庫中的Htmlweb.load(URL)方法,顯然不是要走的路。
然後我搜索了HttpWebRequest和我發現其中一個例子是這樣的
Dim cookies As New CookieContainer
Dim postData As String = "postData obtained using live httpheaders pluging in firefox"
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)
Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create("URL"), HttpWebRequest)
postRequest.Method = "POST"
postRequest.KeepAlive = True
postRequest.CookieContainer = cookies
postRequest.ContentType = "application/x-www-form-urlencoded"
postRequest.ContentLength = byteData.Length
postRequest.Referer = "Referer Page"
postRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 Firefox/4.0 (.NET CLR 3.5.30729)"
Dim postreqstream As Stream = postRequest.GetRequestStream()
postreqstream.Write(byteData, 0, byteData.Length)
postreqstream.Close()
Dim postresponse As HttpWebResponse
postresponse = DirectCast(postRequest.GetResponse(), HttpWebResponse)
cookies.Add(postresponse.Cookies)
Dim postreqreader As New StreamReader(postresponse.GetResponseStream())
Dim thepage As String = postreqreader.ReadToEnd
現在,當我輸出拖到繪圖頁變量在VB形式的瀏覽器,我可以看到我想要的頁面(含表) 。在這一點上,我只是通過了頁面的URL來htmlagilitypack像這樣
Dim web As New HtmlAgilityPack.HtmlWeb()
Dim htmlDoc As HtmlAgilityPack.HtmlDocument = web.Load("URL")
Dim tabletag As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//table")
Dim tablenode As HtmlNode = htmlDoc.DocumentNode.SelectSingleNode("//table[@summary='List of services']")
If Not tabletag Is Nothing Then
Console.WriteLine("YES")
End If
但tabletag變量是什麼。我想知道我哪裏錯了?無論如何,直接從httpwebrespone獲得URL,這樣我可以傳入web.load方法?
謝謝
我意識到問題是在該頁面中運行的腳本。所以webbrowser在腳本完成後顯示頁面,但文本框顯示之前的html文件,這就是爲什麼它沒有表格。現在的問題是我如何等待腳本運行,然後閱讀html? – 2012-07-16 20:27:34
「當我輸出頁面變量到瀏覽器」:如果你輸出'thepage'的值到一個文本文件並檢查它,那麼它是否包含表? – 2013-05-29 19:45:41