可以使用HTMLEditorKit
下載整個網頁。但是,我需要下載需要滾動的整個網頁才能加載其全部內容。這項技術通常通過與Ajax捆綁在一起的JavaScript來實現。下載整個網頁
問:有沒有辦法來欺騙所述指定網頁,使用只有Java code
,以下載其全部內容?
問題2:如果這不可能只用Java,那麼是否可以結合使用JavaScript?
簡單的通知,我寫道:
public class PageDownload {
public static void main(String[] args) throws Exception {
String webUrl = "...";
URL url = new URL(webUrl);
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
HTMLEditorKit.Parser parser = new ParserDelegator();
HTMLEditorKit.ParserCallback callback = htmlDoc.getReader(0);
parser.parse(br, callback, true);
for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.IMG);
iterator.isValid(); iterator.next()) {
AttributeSet attributes = iterator.getAttributes();
String imgSrc = (String) attributes.getAttribute(HTML.Attribute.SRC);
if (imgSrc != null && (imgSrc.endsWith(".jpg") || (imgSrc.endsWith(".jpeg"))
|| (imgSrc.endsWith(".png")) || (imgSrc.endsWith(".ico"))
|| (imgSrc.endsWith(".bmp")))) {
try {
downloadImage(webUrl, imgSrc);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
}
private static void downloadImage(String url, String imgSrc) throws IOException {
BufferedImage image = null;
try {
if (!(imgSrc.startsWith("http"))) {
url = url + imgSrc;
} else {
url = imgSrc;
}
imgSrc = imgSrc.substring(imgSrc.lastIndexOf("/") + 1);
String imageFormat = null;
imageFormat = imgSrc.substring(imgSrc.lastIndexOf(".") + 1);
String imgPath = null;
imgPath = "..." + imgSrc + "";
URL imageUrl = new URL(url);
image = ImageIO.read(imageUrl);
if (image != null) {
File file = new File(imgPath);
ImageIO.write(image, imageFormat, file);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
你能不能給一個EXA這樣的網站/頁面的多少個好嗎? – 2014-10-24 22:33:35