0
我寫了一個方法來分析一個網站 - 找到它中的所有唯一鏈接並計算所有圖像的大小(以字節爲單位)。在一些網站的情況下,它的工作原理,但有一些("https://www.nasa.gov"
)它沒有。請問有人請給出一個提示是什麼原因?JSOUP不能在某個網站上工作
/**
* @param url - url to the page to be parsed
* @return - a hashset of unique links found in the page
* @throws IOException - whan a problem with the connection occurs
*/
private static HashSet<String> AnalyzeUrl(String url) throws IOException
{
Document doc = Jsoup.connect(url).get();
HashSet<String> uniqueImages = new HashSet<>();
HashSet<String> uniqueLinks = new HashSet<>();
// Get unique images
Elements images = doc.getElementsByTag("img");
for (Element image : images)
uniqueImages.add(image.attr("abs:src"));
// Get unique links
Elements links = doc.getElementsByTag("a");
for (Element link : links)
uniqueLinks.add(link.attr("abs:href"));
// Get total size of images
int totalSize = 0;
for (String imageUrl : uniqueImages)
totalSize += Jsoup.connect(imageUrl).ignoreContentType(true).execute().bodyAsBytes().length;
// Show information
String information = "Unique images found: " + uniqueImages.size() + "\n" +
"Total size of images: " + totalSize + " bytes \n" +
"Unique links found: " + uniqueLinks.size() + "\n";
Alert alert = new Alert(Alert.AlertType.INFORMATION, information, ButtonType.OK);
alert.showAndWait();
return uniqueLinks;
}
查看源代碼:源代碼中沒有'img'和'a'標籤。它們是由JavaScript生成的。 –