2016-02-29 95 views
1

我一直在使用IE從Excel中抓取這個網站,但最近使用IE已經不一致和緩慢。我的列表通常在500到1000左右,所以我必須在一夜之間運行宏。最近宏開始掛斷。這就是爲什麼我決定第一次使用MSXML2進行資源管理器的搜索。XML網絡動態密鑰刮網站

該網站不需要認證,但它具有動態更改的隱藏輸入。

我做了什麼..我使用GET來拉動網站並提取動態密鑰,然後嘗試使用POST將輸入數據發送到網站。我一直在獲取服務器錯誤/運行時錯誤。我曾嘗試使用不同的標題請求選項,但我仍然沒有得到結果頁。我也嘗試使用MSXML2.ServerXMLHTTP。我在正確的軌道上嗎?

感謝

Sub test_66() 
    Dim oXML_get 
    'Dim oXML_post 
    Dim sendText As String, s2 As String, n1 As Integer, postUrl As String,  sHTML As String, s1 As String 

    ' Instantiate MSXML2 
    Set oXML_get = New MSXML2.XMLHTTP 

    oXML_get.Open "GET", "http://www.phila.gov/revenue/realestatetax/default.aspx", False 
    oXML_get.setRequestHeader "Accept", "text/html;charset=UTF-8" 
    oXML_get.setRequestHeader "Accept-Encoding", "identity" 
    oXML_get.setRequestHeader "Accept-Charset", "UTF-8" 'Connection keep -alive 
    oXML_get.setRequestHeader "Connection", "keep -alive" 

    oXML_get.send 

    sHTML = oXML_get.responseText 
    'Debug.Print sHTML 
    Dim hDOC As MSHTML.HTMLDocument 
    Set hDOC = New MSHTML.HTMLDocument 
      hDOC.body.innerHTML = sHTML 
    s1 = Replace(hDOC.getElementsByTagName("input").Item(2).Value, "/", "%2F") 
    s2 = Replace(hDOC.getElementsByTagName("input").Item(3).Value, "/", "%2F") 

    sendText = "__VIEWSTATE=" & s1 & "&__EVENTVALIDATION=" & s2 & "&ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24txtTaxInfo=043185500&ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24btnTaxByBRT=%20>>" 
    Debug.Print sendText '"__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=" & s1 & "__EVENTVALIDATION=" & s2 & 

    oXML_get.Open "POST", "http://www.phila.gov/revenue/realestatetax/default.aspx", False 
    oXML_get.setRequestHeader "Content-Type", "application/x-www-form-urlencoded" 
    oXML_get.setRequestHeader "Accept", "text/html;charset=UTF-8" 
    oXML_get.setRequestHeader "Accept-Encoding", "identity" 
    oXML_get.setRequestHeader "Accept-Charset", "UTF-8" 'Connection keep -alive 
    'oXML_get.setRequestHeader "Connection", "keep -alive" 
    oXML_get.send (sendText) 

    Dim objIE As Object: Set objIE = CreateObject("InternetExplorer.Application") 
    objIE.navigate "about:blank" 
    objIE.Visible = True 
    objIE.document.Write oXML_get.responseText 
End Sub 

這是我收到運行時錯誤消息....

Server Error in '/revenue/RealEstateTax' Application. 
<!-- Web.Config Configuration File --> 

<configuration> 
    <system.web> 
     <customErrors mode="Off"/> 
    </system.web> 
</configuration> 
+0

檢查,如果你只是缺少&'sendText = 「__VIEWSTATE =」 &S1&「&__ EVENTVALIDATION = 「...' – omegastripes

+0

你說得對,我會更新並檢查代碼。感謝您的快速響應。 – user3121922

+0

我插入了&符號,但仍然收到來自服務器的相同錯誤。我已經更新了上面的代碼。 – user3121922

回答

1

我從Web表單在Firefox提交網頁上相同的搜索請求。從那以後,我打開開發者工具F12,網絡選項卡,單擊最後一個POST請求,打開參數部分,這裏是已提交的參數的截圖:

form data

原始表單數據:

__EVENTTARGET = & __EVENTARGUMENT = & __VIEWSTATE =%2FwEPDwULLTEyNDQ4MDU4OTkPZBYCZg9kFgICAw9kFgICDQ9kFgYCAQ9kFgICAw9kFgICAQ8QZBAVARUxNzAwIFNQUklORyBHQVJERU4gU1QVARUxNzAwIFNQUklORyBHQVJERU4gU1QUKwMBZxYBZmQCBQ8PFgIeBFRleHQFHFBsZWF​​zZ SBhZGQgYWRkcmVzcyB0byBsb29rdXBkZAINDw8WAh4HVmlzaWJsZWhkFgoCAQ88KwAKAQAPFgQeC18hRGF0YUJvdW5kZx4LXyFJdGVtQ291bnRmZGQCAw9kFgICBQ8PFgIeF0VuYWJsZUFqYXhTa2luUmVuZGVyaW5naGRkAgUPFCsAAg8WAh8EaGQQFgJmAgEWAg8WBB4LTmF2aWdhdGVVcmwFJC4uL0ZlZWRiYWNrRm9ybS5hc3B4P0JydE5vPTc3MjUzNDcwMB8EaGQPFgQfBQUdfi9QREZzL1BheW1lbnRfQWdyZWVtZW50cy5wZGYfBGhkDxYCZmYWAQVxVGVsZXJpay5XZWIuVUkuUmFkV2luZG93LCBUZWxlcmlrLldlYi5VSSwgVmVyc2lvbj0yMDEwLjEuNTE5LjQwLCBDdWx0dXJlPW5ldXRyYWwsIFB1YmxpY0tleVRva2VuPTEyMWZhZTc4MTY1YmEzZDQWBGYPDxYEHwUFJC4uL0ZlZWRiYWNrRm9ybS5hc3B4P0JydE5vPTc3MjUzNDcwMB8EaGRkAgEPDxYEHwUFHX4vUERGcy9QYXltZW50X0FncmVlbWVudHMucGRmHwRoZGQCBw88KwARAgAPFgQfAmcfA2ZkARAWABYAFgBkAgkPFgIeBXZhbHVlBQk3NzI1MzQ3MDBkGAIFQWN0bDAwJEJvZHlDb250ZW50UGxhY2VIb2xkZXIkR2V0VGF4SW5mb0NvbnRyb2wkZ3JkUGF5bWVudHNIaXN0b3J5DzwrAAwBCGZkBTJjdGwwMCRCb2R5Q29udGVudFBsYWNlSG9sZGVyJEdldFRheEluZm9Db250cm9sJGZybQ9nZD9K5t7genscvOsiNrdPkxL0VHWCYSsS%2FK3EZTRu3h3w & __EVENTVALIDATION =%2FwEWBQKkrNCPCgLRzsWTBwLlpIbACAKV6q2KD QKIvdHyCawQaHbBYSHV%2B%2FVvyLUTUY%2BhSsmbpTvj0W4ycfOa1RCO & ctl00%24BodyContentPlaceHolder%24SearchByAddressControl%24txtLookup由+屬性= +地址& ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24txtTaxInfo = 043185500 & ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24btnTaxByBRT = +%3E%3E

請注意,有7個參數。所有這些應該是URL編碼的。我稍微修改了代碼並修改了代碼,還添加了一些請求標頭。下面的代碼工作正確對我來說:

Option Explicit 

Sub test_66() 

    Dim s1 As String 
    Dim s2 As String 
    Dim sResp As String 
    Dim aTmp As Variant 
    Dim sBRTNumber As String 
    Dim sFormData As String 

    With CreateObject("MSXML2.XMLHTTP") 
     .Open "GET", "http://www.phila.gov/revenue/realestatetax/default.aspx", False 
     .setRequestHeader "Accept", "text/html;charset=UTF-8" 
     .setRequestHeader "Accept-Encoding", "identity" 
     .setRequestHeader "Accept-Charset", "UTF-8" 
     .setRequestHeader "Connection", "keep-alive" 
     .send 
     sResp = .responseText 
    End With 
    aTmp = Split(sResp, "id=""__VIEWSTATE"" value=""", 2) 
    s1 = aTmp(1) 
    aTmp = Split(s1, """", 2) 
    s1 = aTmp(0) 
    aTmp = Split(sResp, "id=""__EVENTVALIDATION"" value=""", 2) 
    s2 = aTmp(1) 
    aTmp = Split(s2, """", 2) 
    s2 = aTmp(0) 
    s1 = EncodeUriComponent(s1) 
    s2 = EncodeUriComponent(s2) 

    sBRTNumber = "043185500" 
    sFormData = Join(Array(_ 
     "__EVENTTARGET=", _ 
     "__EVENTARGUMENT=", _ 
     "__VIEWSTATE=" & s1, _ 
     "__EVENTVALIDATION=" & s2, _ 
     "ctl00%24BodyContentPlaceHolder%24SearchByAddressControl%24txtLookup=by+Property+Address", _ 
     "ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24txtTaxInfo=" & sBRTNumber, _ 
     "ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24btnTaxByBRT=+%3E%3E" _ 
     ), "&") 

    With CreateObject("MSXML2.XMLHTTP") 
     .Open "POST", "http://www.phila.gov/revenue/realestatetax/default.aspx", False 
     .setRequestHeader "Content-Type", "application/x-www-form-urlencoded" 
     .setRequestHeader "Accept", "text/html;charset=UTF-8" 
     .setRequestHeader "Accept-Encoding", "identity" 
     .setRequestHeader "Accept-Charset", "UTF-8" 
     .setRequestHeader "Connection", "keep-alive" 
     .setRequestHeader "Host", "www.phila.gov" 
     .setRequestHeader "Origin", "http://www.phila.gov" 
     .setRequestHeader "Referer", "http://www.phila.gov/revenue/realestatetax/default.aspx" 
     .send (sFormData) 
     sResp = .responseText 
    End With 

    With CreateObject("InternetExplorer.Application") 
     .navigate "about:blank" 
     .Visible = True 
     .document.write sResp 
    End With 

End Sub 

Function EncodeUriComponent(strText As String) As String 
    Static objHtmlfile As Object 
    If objHtmlfile Is Nothing Then 
     Set objHtmlfile = CreateObject("htmlfile") 
     objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript" 
    End If 
    EncodeUriComponent = objHtmlfile.parentWindow.encode(strText) 
End Function 

這裏是IE窗口輸出:

output

+0

謝謝omegastripes, 你是絕對正確的。我的代碼上的問題是URL編碼s1和s2。我想我可以手動編碼這兩個鍵,但我錯了。我接受你的解決方案,但這是我如何解決這個問題。因爲我在Excel 2013中,我使用內置的WorksheetFunction.EncodeURL來編碼這兩個鍵,它工作。 – user3121922

+0

我每次都從代碼中學習新東西。我不能夠感謝你。我只是使用你的數組拆分方法來提取數據!它太酷了。 – user3121922