2016-02-02 38 views
1

我正在嘗試網絡刮與Java,並計劃最終將此代碼投入Android,所以此刻我嘗試與JSOUP。使用Chrome的DevTools,我將請求頭文件和curl命令從網頁中返回。我可以捲曲運行下面的命令,它的工作原理:捲曲工程,JSOUP返回HTTP錯誤500

curl 'mySite/campaign/List' -H 'Cookie: __RequestVerificationToken_L0N5YXJhV2ViUG9ydGFs0=IECNY-SOnB09IY9MQMm3xL1bSbASe8Eha9J1fWupurHtmlldojgqpaljhzIuhfFh6zRnOygjsrKyuhj2krWiSSNXif76gRNH_39lGvyMJ0I1; ASP.NET_SessionId=gojtobwzycl0lvs0ip4glf3n; myCompany.WEB.PORTAL.AUTH=40C13BAF08884380F805B99E217754F3D35920CE1861DEBB580DC143DA4249C4682C33A36DD29272A3A844880110E4D0EC1F24298E4D1B2A4A94E3FA2CAC08B934989ACF155616D6CB5665338FF3CFF82EAD87BF93EB46FA3BA6AAE6B00401F9' -H 'Origin: mySite' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.8' -H 'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36' -H 'Content-Type: application/json;charset=UTF-8' -H 'Accept: */*' -H 'Referer: mySite/campaign' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' -H '__RequestVerificationToken: G2RD7FtHMG12j00zNuLtiSZSWquXAOvh1hUNxObxMCFIZclrQueAo4d3cZonI1MZ7hxELl56yi5hci5vpC78m4Sh8PivHwRcKImcCibi9xk1' --data-binary '{"PageNumber":2,"SortColumn":"ScheduledRunDate","SortAscending":false,"PageSize":20,"CollectionSize":308,"SelectedAccountId":"1","SearchTerm":"","ShowInactive":true}' --compressed 

我還拉頭請求頭的Chrome DevTools:

POST mySite/campaign/List HTTP/1.1 
Host: mySite 
Connection: keep-alive 
Content-Length: 165 
Origin: mySite 
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36 
Content-Type: application/json;charset=UTF-8 
Accept: */* 
X-Requested-With: XMLHttpRequest 
__RequestVerificationToken: G2RD7FtHMG12j00zNuLtiSZSWquXAOvh1hUNxObxMCFIZclrQueAo4d3cZonI1MZ7hxELl56yi5hci5vpC78m4Sh8PivHwRcKImcCibi9xk1 
Referer: mySite/campaign 
Accept-Encoding: gzip, deflate 
Accept-Language: en-US,en;q=0.8 
Cookie: __RequestVerificationToken_L0N5YXJhV2ViUG9ydGFs0=IECNY-SOnB09IY9MQMm3xL1bSbASe8Eha9J1fWupurHtmlldojgqpaljhzIuhfFh6zRnOygjsrKyuhj2krWiSSNXif76gRNH_39lGvyMJ0I1; ASP.NET_SessionId=gojtobwzycl0lvs0ip4glf3n; myCompany.WEB.PORTAL.AUTH=40C13BAF08884380F805B99E217754F3D35920CE1861DEBB580DC143DA4249C4682C33A36DD29272A3A844880110E4D0EC1F24298E4D1B2A4A94E3FA2CAC08B934989ACF155616D6CB5665338FF3CFF82EAD87BF93EB46FA3BA6AAE6B00401F9 

我再嘗試轉換到這jsoup和沒有運氣。我試着只使用標題,並使用標題以及PageNumber,ScheduledRunDate等等。兩次嘗試都返回org.jsoup.HttpStatusException:HTTP錯誤獲取URL。狀態= 500。這裏是我正在嘗試的代碼:

Document pageDoc = Jsoup.connect("mySite/campaign/List") 
       .cookies(loginCookies) 
       //.header("Cookie",cookieList) 
       .userAgent("Mozilla/5.0") 
       .referrer("mySite/campaign") 
       //.data("Username", username) 
       //.data("Password", password) 
       //.followRedirects(true) 
       .header("Accept","*/*") 
       .header("Accept-Encoding","gzip, deflate") 
       .header("Accept-Language","en-US,en;q=0.8") 
       .header("Connection","keep-alive") 
       .header("Content-Type", "application/json;charset=UTF-8") 
       .header("Host","mySite") 
       .header("Origin", "mySite") 
       .header("Referer","mySite/campaign") 
       .header("User-Agent","Mozilla/5.0 (Windows NT 6.1: WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36") 
       .header("X-Requested-With", "XMLHttpRequest") 
       .header("__RequestVerificationToken", pageToken) 
       .header("Content-Length", "165") //not sure if needed. If it is, no idea how to get 
       .data("PageNumber","2") 
       .data("SortColumn", "ScheduledRunDate") 
       .data("SortAscending", "false") 
       .data("PageSize", "20") 
       .data("CollectionSize", "308") 
       .data("SelectedAccountId", "1") 
       .data("SearchTerm", "") 
       .data("ShowInactive", "true")    
       .ignoreContentType(true) 
       .post(); 

我可以確認我的所有令牌都是正確的。當我註釋掉.header(「X-Requested-With」,「XMLHttpRequest」)時,我收到一般錯誤頁面(這是預期的),所以我知道我正在連接,但是當我離開它時,我得到了500個。也可以確認所有的「mySite」鏈接是正確的,我只需要刪除他們,我的公司。我也不確定是否需要爲jsoup添加PageNumber,SortColumn,SortAscending等,所以我只是盲目地將它們添加爲上面顯示的數據參數。

+0

這是您的完整碼?你是從發送'POST'請求開始的,還是你有另一個步驟? – TDG

+0

您可以嘗試拉動通過curl交換的標題,並將它們與從Chrome DevTools中獲取的標題進行比較。 – Stephan

回答

0

嘗試移除header("Content-Length", "165").header("Content-Type", "application/json;charset=UTF-8")。 Jsoup可以爲你添加它們。也可以嘗試使用FormElement。看到這個FormElement example