2016-11-09 50 views
0

這是我第一次使用JSoup,並且遇到連接到我想要解析信息的url的問題。使用Jsoup連接到網頁的問題

的網址: http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0

我最初試圖做到這一點,但我得到一個超時異常

Document doc = Jsoup.connect("http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0").get(); 

這裏是個例外:

java.net.SocketTimeoutException: Read timed out 
    at java.net.SocketInputStream.socketRead0(Native Method) 
    at java.net.SocketInputStream.read(SocketInputStream.java:152) 
    at java.net.SocketInputStream.read(SocketInputStream.java:122) 
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) 
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) 
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334) 
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) 
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) 
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324) 
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) 
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:575) 
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:548) 
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:235) 
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:224) 
    at ParseData.main(ParseData.java:18) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) 

我做了一些研究在線,我發現了一個方法.timeout(0),它將Jsoup超時設置爲無限。

現在,當我嘗試這個

  Document doc = Jsoup.connect("http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0").timeout(0).get(); 

我得到以下異常:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0 
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:598) 
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:548) 
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:235) 
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:224) 
    at ParseData.main(ParseData.java:18) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) 

可能有人請指出我的我應該如何加載此網址到jsoup正確的方向?

回答

1

403錯誤表示服務器禁止訪問。 你只需要用戶代理屬性添加到HTTP頭如下:

Jsoup.connect("http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0") 
.userAgent("Mozilla/5.0") 
.timeout(0).get(); 
1

有些網站不允許的機器人,這就是正在發生的事情對這個站點。 您必須添加用戶代理,以免受限制。