我正在玩Java 8中新引入的併發功能,來自Cay S. Horstmann的書籍「Java SE 8 for the Really Impatient」的練習。我使用新的CompletedFuture和jsoup創建了以下網絡爬蟲。其基本思想是給出一個URL,它會在該頁面上首先找到m URL,並重復該過程n次。 m and n當然是參數。問題是程序獲取初始頁面的URL,但不會遞歸。我錯過了什麼?Java 8 CompletedFuture網絡爬蟲不爬行一個URL
static class WebCrawler {
CompletableFuture<Void> crawl(final String startingUrl,
final int depth, final int breadth) {
if (depth <= 0) {
return completedFuture(startingUrl, depth);
}
final CompletableFuture<Void> allDoneFuture = allOf((CompletableFuture[]) of(
startingUrl)
.map(url -> supplyAsync(getContent(url)))
.map(docFuture -> docFuture.thenApply(getURLs(breadth)))
.map(urlsFuture -> urlsFuture.thenApply(doForEach(
depth, breadth)))
.toArray(size -> new CompletableFuture[size]));
allDoneFuture.join();
return allDoneFuture;
}
private CompletableFuture<Void> completedFuture(
final String startingUrl, final int depth) {
LOGGER.info("Link: {}, depth: {}.", startingUrl, depth);
CompletableFuture<Void> future = new CompletableFuture<>();
future.complete(null);
return future;
}
private Supplier<Document> getContent(final String url) {
return() -> {
try {
return connect(url).get();
} catch (IOException e) {
throw new UncheckedIOException(
" Something went wrong trying to fetch the contents of the URL: "
+ url, e);
}
};
}
private Function<Document, Set<String>> getURLs(final int limit) {
return doc -> {
LOGGER.info("Getting URLs for document: {}.", doc.baseUri());
return doc.select("a[href]").stream()
.map(link -> link.attr("abs:href")).limit(limit)
.peek(LOGGER::info).collect(toSet());
};
}
private Function<Set<String>, Stream<CompletableFuture<Void>>> doForEach(
final int depth, final int breadth) {
return urls -> urls.stream().map(
url -> crawl(url, depth - 1, breadth));
}
}
測試用例:
@Test
public void testCrawl() {
new WebCrawler().crawl(
"http://en.wikipedia.org/wiki/Java_%28programming_language%29",
2, 10);
}
什麼是'allOf'和'of'在'的allOf((CompletableFuture [])(startingUrl)' –
什麼?是'Document'?請發佈一個可重複的例子 –
@SotiriosDelimanolis這是工作代碼;'allOf'和'of'是靜態導入;'Document'是一個'jsoup'類,我不想讓這個帖子混亂下面是[代碼](https://github.com/abhijitsarkar/java/blob/master/java8-impatient/src/main/java/name/abhijitsarkar/java/java8impatient/concurrency/PracticeQuestionsCh6.java) –