從網站獲得的內容在Java中

我希望得到被稱爲本網站http://globoesporte.globo.com/temporeal/futebol/20-10-2013/botafogo-vasco/從網站獲得的內容在Java中

專門位於屏幕的右下方的元素的所有內容「estatisticas」

我試圖下載FireBug並使用jsoup獲取HTML文件，但沒有奏效。 Jsoup找不到我想要的內容，這讓我感到有點惱火。 Idk哪些技術/ API或什麼我應該用來從網站獲取整個數據，我很感激你們是否幫助我。

在此先感謝。

來源

2013-10-22 lucasdc

您可以嘗試使用Apache的HttpClient連接到使用GET請求的網站，然後檢索'String'所有的內容和檢索來自這個巨大的'String'的數據手動。 –

請參閱：http://stackoverflow.com/questions/3202305/web-scraping-with-java/6775957#6775957 – Nishan

'estatisticas'在頁面加載後通過AJAX調用加載 - 你不能從頁面上刮掉它們，因爲它們不在那裏。

你可以，但是，讓他們JSON格式的地址爲：http://globoesporte.globo.com/temporeal/futebol/20-10-2013/botafogo-vasco/estatisticas.json

來源

2013-10-22 05:43:57

謝謝！這就是我想要的。但讓我問你，你是如何得到這個鏈接的？ – lucasdc

與Firebug。我查看了該頁面生成的網絡流量 – 2013-10-22 14:31:26

爲您需要探索像jsoup和HTML解析器HTML解析器。如果你想包括所有的HTML標籤的代碼，然後你，如果你還打算抓取網站試試這個代碼

URL url = new URL("http://www.example.com"); 
InputStream io = url.openStream(); 
BufferedReader br = new BufferedReader(new InputStreamReader(io)); 
String str =""; 
while((str=br.readLine())!=null) 
{ 
System.out.println(str); 
}

來源

2013-10-22 05:45:28 Simmant

，您可以用HttpClient，它可以提供幾乎所有的HTTP協議操作。下面是一個代碼片段，其可能適合你想要什麼：

HttpClient httpclient = new DefaultHttpClient(); 
HttpGet httpget = new HttpGet("http://globoesporte.globo.com/temporeal/futebol/20-10-2013/botafogo-vasco/"); 
HttpResponse response = httpclient.execute(httpget); 
HttpEntity entity = response.getEntity(); 
if (entity != null) { 
    InputStream instream = entity.getContent(); 
    try { 
     // do something useful 
    } finally { 
     instream.close(); 
    } 
}

附：爲HttpClient行家：

<dependency> 
    <groupId>commons-httpclient</groupId> 
    <artifactId>commons-httpclient</artifactId> 
    <version>3.1</version> 
</dependency>

希望它能幫助:)

來源

2013-10-22 06:01:02 Judking

從網站獲得的內容在Java中

回答

相關問題