我想從下面列出的URL中刮取HTML代碼。問題是,我得到這個錯誤: -org.jsoup.HttpStatusException:HTTP錯誤獲取URL。狀態= 504嘗試抓取HTML內容時出錯
Aug 14, 2016 6:40:36 PM booksscraper.BooksScraper main SEVERE: null org.jsoup.HttpStatusException: HTTP error fetching URL. Status=504, URL= http://www.bkstr.com/webapp/wcs/stores/servlet/CourseMaterialsResultsView?catalogId=10001&categoryId=9604&storeId=10293&langId=-1&programId=636&termId=100043741&divisionDisplayName=%20&departmentDisplayName=ACCG&courseDisplayName=16971§ionDisplayName=P15%20DAVIS&demoKey=d&purpose=browse at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216) at booksscraper.BooksScraper.main(BooksScraper.java:52)
我已經設置超時無限,但沒有幫助。該網站的HTML代碼非常大,即14833行代碼。這是問題的原因嗎?
String url = "http://www.bkstr.com/webapp/wcs/stores/servlet/CourseMaterialsResultsView?catalogId=10001&categoryId=9604&storeId=10293&langId=-1&programId=636&termId=100043741&divisionDisplayName=%20&departmentDisplayName=ACCG&courseDisplayName=16971§ionDisplayName=P15%20DAVIS&demoKey=d&purpose=browse";
Document doc = Jsoup.connect(url)
.maxBodySize(0)
.timeout(0)
.get();
System.out.println(doc);
謝謝你的回答notionquest。但是,網關超時僅在我們直接輸入URL時纔會顯示。如果我們通過這個「[URL]」(http://www.bkstr.com/sheridandavisstore/shop/textbooks-and-course-materials?cm_sp=GlobalJuly122016BTS-_-ShipTextbooks-_-943)「轉到URL,網關超時發生。 這是怎麼發生的? – Rokin