無法獲取網頁的源代碼

我正在嘗試使用Java從此網站獲取HTML頁面源內容：「http://207.200.96.231:8008」。但是默認的Java庫並沒有幫助我。我也嘗試使用這個tutorial，但它也沒有工作。我認爲問題是由於網站的安全保護而發生的。當我運行下面提供的以下代碼時，我得到一個異常：java.io.IOException: Invalid Http response。無法獲取網頁的源代碼

有關如何實現代碼的任何想法？還是有任何圖書館可以滿足我的需求？到目前爲止，我已經嘗試JSoup和Jericho HTML解析器認爲他們會使用不同的方法連接到我提供的站點，但他們也失敗了。

String urlstr = "http://72.26.204.28:9484/played.html"; 

try { 

    URL url = new URL(urlstr); 

    URLConnection urlc = url.openConnection(); 

    InputStream stream = urlc.getInputStream(); 
    BufferedInputStream buf = new BufferedInputStream(stream); 

    StringBuilder sb = new StringBuilder(); 

    while (true){ 

    int data = buf.read(); 

    if (data == -1) 
     break; 
    else 
     sb.append((char)data); 
    } 

    } catch (MalformedURLException e) { 
     e.printStackTrace(); 
    } catch (IOException e) { 
     e.printStackTrace(); 
}

EDIT（問題解決）：隨着Karai17和trashgod幫助我設法解決這個問題。 Shoutcast頁面需要用戶代理才能訪問其內容。所以我們需要做的是添加以下代碼：

urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0");

最新的代碼如下所示：

try { 
     URL url = new URL("http://207.200.96.231:8008/7.html"); 
     HttpURLConnection urlConnection = (HttpURLConnection)url.openConnection(); 
     urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0"); 

     InputStream is = urlConnection.getInputStream(); 
     BufferedInputStream in = new BufferedInputStream(is); 
     int c; 
     while ((c = in.read()) != -1) { 
      System.out.write(c); 
     } 
     urlConnection.disconnect(); 
    } catch (MalformedURLException e) { 
     e.printStackTrace(); 
    } catch (IOException e) { 
     e.printStackTrace(); 
}

來源

2012-07-30 Mr.Hankey

這流似乎需要Winamp。

 
$ curl -v http://207.200.96.231:8008 
* About to connect() to 207.200.96.231 port 8008 (#0) 
* Trying 207.200.96.231... connected 
* Connected to 207.200.96.231 (207.200.96.231) port 8008 (#0) 
It appears to require [Winamp][2]. 

> GET/HTTP/1.1 
> User-Agent: curl/... 
> Host: 207.200.96.231:8008 
> Accept: */* 
> 
ICY 200 OK 
icy-notice1:
This stream requires Winamp
 
icy-notice2:SHOUTcast Distributed Network Audio Server/Linux v1.9.93atdn
 
icy-name:Absolutely Smooth Jazz - SKY.FM - the world's smoothest jazz 24 hours a day 
icy-genre:Soft Smooth Jazz 
icy-url:http://www.sky.fm/smoothjazz/ 
content-type:audio/mpeg 
icy-pub:1 
icy-br:96 
...

附錄：可以讀取流是這樣的：

URL url = new URL("http://207.200.96.231:8008"); 
URLConnection con = url.openConnection(); 
InputStream is = con.getInputStream(); 
BufferedInputStream in = new BufferedInputStream(is); 
int c; 
while ((c = in.read()) != -1) { 
    System.out.write(c); 
}

來源

2012-07-30 01:22:14 trashgod

對不起，我沒有得到它。有沒有一種使用java的winamp來獲取源代碼的方法？ – 2012-07-30 02:00:26

我不知道Winamp，但你可以閱讀如上所示的流。 – trashgod 2012-07-30 06:11:07

我認爲沒有太大區別，您提供的代碼並未解決我的問題，對不起。 – 2012-07-30 12:19:05

無法獲取網頁的源代碼

回答

相關問題