如何檢索某個網站的特定信息？

-2

我正在開發一個java web應用程序，我想知道如何從某個網站獲取某個字段（表和/或輸出文本）的值。假設這個組件總是有相同的ID，沒有人知道我該如何檢索這些信息？我不知道有沒有人遇到過這個問題，但如果有人有任何想法，請分享。謝謝。如何檢索某個網站的特定信息？

來源

2013-07-19 Noah Martin

使用'jsoup'此HTTP ：//jsoup.org/ – DevZer0

'我正在開發一個java web應用程序' - 爲什麼你要標記這個[php]呢？ – DaveRandom

你也可以嘗試webharvest – Karthikeyan

一般： 1）檢索的網頁標記在應用程序中 2）解析使用像jsoup框架的標記和檢索您需要的值閱讀它通過爲HttpConnection的URL。

更具體地說，這裏是jsoup一些示例代碼：

HttpClient http = new DefaultHttpClient(); 
String htmlcode = ""; 
HttpGet request = new HttpGet("http://www.example.com"); 
HttpResponse response = null; 
try { 
    response = http.execute(request); 
} catch (ClientProtocolException e) { 
    e.printStackTrace(); 
} catch (IOException e) { 
    e.printStackTrace(); 
} 
if(response != null){ 
    BufferedReader read = new BufferedReader(new InputStreamReader(response.getEntity().getContent())); 

    String line = ""; 
    while((line = read.readLine()) != null){ 
     htmlcode += line; 
    } 
} 
// at this point we have the pages markup 
Document doc = Jsoup.parse(htmlcode); 
Elements lis = doc.getElementsByTag("li"); // get all entries in lists 
for(Element el : lis){ 
    String val = el.text().trim(); 
    // do something for each list entry 
}

來源

2013-07-19 10:48:12 LuigiEdlCarno

什麼笨拙。您也可以只使用'Jsoup.connect（「http://www.example.com」）.get（）'通過URL獲取文檔，而不需要整個HttpClient樣板文件（在您的特定示例中方式字符編碼問題，您依靠平臺默認編碼）。 – BalusC

謝謝你的提示。這只是在沒有完全研究文檔的情況下輸入。 – LuigiEdlCarno

你所談論的網頁抓取，檢查此庫PHP：

http://simplehtmldom.sourceforge.net/

來源

2013-07-19 10:51:03

這就是php庫，這個問題不再標記[php] – DevZer0

@ DevZer0啊，是的，他現在刪除標籤。 –

如何檢索某個網站的特定信息？

回答

相關問題