0

我想從下面列出的URL中刮取HTML代碼。問題是,我得到這個錯誤: -org.jsoup.HttpStatusException:HTTP錯誤獲取URL。狀態= 504嘗試抓取HTML內容時出錯

Aug 14, 2016 6:40:36 PM booksscraper.BooksScraper main SEVERE: null org.jsoup.HttpStatusException: HTTP error fetching URL. Status=504, URL= http://www.bkstr.com/webapp/wcs/stores/servlet/CourseMaterialsResultsView?catalogId=10001&categoryId=9604&storeId=10293&langId=-1&programId=636&termId=100043741&divisionDisplayName=%20&departmentDisplayName=ACCG&courseDisplayName=16971&sectionDisplayName=P15%20DAVIS&demoKey=d&purpose=browse at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216) at booksscraper.BooksScraper.main(BooksScraper.java:52)

我已經設置超時無限,但沒有幫助。該網站的HTML代碼非常大,即14833行代碼。這是問題的原因嗎?

String url = "http://www.bkstr.com/webapp/wcs/stores/servlet/CourseMaterialsResultsView?catalogId=10001&categoryId=9604&storeId=10293&langId=-1&programId=636&termId=100043741&divisionDisplayName=%20&departmentDisplayName=ACCG&courseDisplayName=16971&sectionDisplayName=P15%20DAVIS&demoKey=d&purpose=browse"; 

Document doc = Jsoup.connect(url) 
       .maxBodySize(0) 
       .timeout(0) 
       .get(); 

System.out.println(doc); 

回答

0

我還是設法通過設置的UserAgent連接到網站Mozilla/5.0(X11; Linux x86_64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/51.0.2704.106 Safari/537.36。但是,花了大約4分鐘的時間來回應。

0

這不是一個Jsoup API或您的代碼問題。錯誤消息的原因是URL沒有響應並拋出「網關超時」錯誤消息(代理服務器沒有收到來自上游服務器的及時響應)。從程序

異常消息: -

HTTP error fetching URL. Status=504

HTTP狀態代碼:504

504 Gateway Timeout

The server, while acting as a gateway or proxy, did not receive a timely response from the upstream server specified by the URI (e.g. HTTP, FTP, LDAP) or some other auxiliary server (e.g. DNS) it needed to access in attempting to complete the request.

Note: Note to implementors: some deployed proxies are known to 
    return 400 or 500 when DNS lookups time out. 
+0

謝謝你的回答notionquest。但是,網關超時僅在我們直接輸入URL時纔會顯示。如果我們通過這個「[URL]」(http://www.bkstr.com/sheridandavisstore/shop/textbooks-and-course-materials?cm_sp=GlobalJuly122016BTS-_-ShipTextbooks-_-943)「轉到URL,網關超時發生。 這是怎麼發生的? – Rokin

相關問題