我是jsoup的新手,想要更加熟悉如何從網站中提取信息。我試圖做一些簡單的事情:從eBay獲取一些價值。jsoup獲取與它們相關的特定標籤和值
我想獲得項目名稱,HTML鏈接,價格從「熱本週」出售數量(喜歡這裏:http://www.ebay.co.uk/sch/Action-Figures/246/bn_1632128/i.html)
但是我不確定如何進行。
package application;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import javax.swing.JOptionPane;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class GetHotSellers {
public static void main(String[] args) {
Document doc = Jsoup.parse(readURL("http://www.ebay.co.uk/sch/Action-Figures/246/bn_1632128/i.html"));
Elements sold_items = doc.getElementsMatchingText("sold$");
for(Element sold : sold_items) {
System.out.println(sold.text());
}
}
public static String readURL(String url) {
String fileContents = "";
String currentLine = "";
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new URL(url).openStream()));
fileContents = reader.readLine();
while (currentLine != null) {
currentLine = reader.readLine();
fileContents += "\n" + currentLine;
}
reader.close();
reader = null;
} catch (Exception e) {
JOptionPane.showMessageDialog(null, e.getMessage(), "Error Message", JOptionPane.OK_OPTION);
e.printStackTrace();
}
return fileContents;
}
}
這是盡我所能。我是否需要改進我的正則表達式,還是需要使用更適合我的請求的其他函數?
我的電流輸出是這樣的:
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold 12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold 12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold
381 sold
381 sold
Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold
Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold
Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold
187 sold
187 sold
Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold
Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold
Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold
174 sold
174 sold
Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold
Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold
Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold
129 sold
129 sold
Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold
Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold
Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold
101 sold
101 sold
Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold
Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold
Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold
89 sold
89 sold
12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold
12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold
12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold
88 sold
88 sold
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
87 sold
87 sold
而我想要的輸出例如:
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay || £7.99 || 87 sold || http://link.com
編輯:
剛纔試了這樣的事情,但沒有運氣。
for(String categoryURL : categoryLinksArray) {
Document doc = Jsoup.parse(readURL(categoryURL));
Elements sold_items = doc.getElementsByClass("b-block-info-container");
for(Element sold : sold_items) {
System.out.println("NAME: " + sold.attr("b-block-info-container__title b-block-info-container__title__ListingSummary") + "\n" +
"PRICE: " + sold.attr("b-block-info-container__price") + "\n" +
"SOLD/week: " + sold.attr("item_quantity__hotness") + "\n" +
"URL: " + sold.attr("abs:href"));
System.out.println("--------------------------------------");
}
}
我試着做所有的類別,但在Jsoup.connect行得到NullPointer。你認爲這是因爲「w6-2-x-carousel-items」是玩具類別所特有的嗎? – lucianozo
是的id是唯一的。所以這不適用於頁面的其餘部分。但是如果你檢查頁面的html代碼,你會看到某種結構。看到我的第二個答案,並在必要時進行修改。 – Eritrean