-1
我需要在所有元素中獲取ID和href(如圖中彩色框所示)。我不知道如何正確地尋找路徑並提取所需的信息。我怎樣才能做到這一點?Jsoup:如何在很多元素中獲取ID和href
我需要在所有元素中獲取ID和href(如圖中彩色框所示)。我不知道如何正確地尋找路徑並提取所需的信息。我怎樣才能做到這一點?Jsoup:如何在很多元素中獲取ID和href
通過標識和標籤,直到你到了相關的標籤,然後通過屬性讓他們選擇。檢查下面的代碼片段:
Document doc = Jsoup.parse("html_file");
Element loginform = doc.getElementById("search_result_container");
Elements inputElements = loginform.getElementsByTag("div");
Element secondDiv = inputElements.get(1);
Elements hyperLinks = secondDiv.getElementsByTag("a");
for (Element alink : hyperLinks) {
String href = alink.attr("href");
String id = alink.attr("id");
}
好的,我做到了。有用!!感謝SUNNYben,你給了我正確的輸入!
這裏是我的解決方案,代碼:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Steam_GameID_Links
{
public static void main(String[] args)
{
Steam_GameID_Links wc = new Steam_GameID_Links();
try
{
String url = "http://store.steampowered.com/search/?sort_by=_ASC&category1=998&page=1";
Document document = Jsoup.connect(url).get();
// nur die Spielnamen
Elements howMuchPages = document.select(".search_pagination_right");
String[] stuff = howMuchPages.text().split(" ");
String tmp = stuff[4].replace(" ", "").replace(".", "");
StringBuilder sb = new StringBuilder();
for(int i = 0; i < tmp.length(); i++)
{
if(Character.isDigit(tmp.charAt(i)))
{
sb.append(tmp.charAt(i));
}
}
String last = sb.toString().trim();;
int lastPages = Integer.parseInt(last);
int counter = 0;
for(int i = 1; i < lastPages + 1; i++)
{
url = "http://store.steampowered.com/search/?sort_by=_ASC&category1=998&page=" + i;
document = Jsoup.connect(url).get();
// waehlt zunaechst den ElternKnoten: <div id="search_result_container">
Element parentNode = document.getElementById("search_result_container");
Elements childNodes = parentNode.getElementsByAttribute("data-ds-appid");
for(Element alink : childNodes)
{
String href = alink.attr("href");
String id = alink.attr("data-ds-appid");
String name = alink.getElementsByClass("title").text();
System.out.println("Spiel: " + name + ", ID: " + id + ", SpieleLink: " + href);
// wc.writeSpielNameIDLink("Spiel: " + name + ", ID: " + id + ", SpieleLink: " + href + "\n");
}
}
}
catch(IOException e)
{
e.printStackTrace();
}
}