Java - 如何使用Jsoup提取Google新聞標題和鏈接？

我是使用jsoup和html的新手。我想知道如何從Google新聞首頁的故事中提取標題和鏈接（如果可能的話）。這裏是我的代碼：Java - 如何使用Jsoup提取Google新聞標題和鏈接？

org.jsoup.nodes.Document doc = null; 
       try { 
        doc = (org.jsoup.nodes.Document) Jsoup.connect("https://news.google.com/").get(); 
       } catch (IOException e1) { 
        // TODO Auto-generated catch block 
        e1.printStackTrace(); 
       } 
       Elements titles = doc.select("titletext"); 

       System.out.println("Titles: " + titles.text()); 


       //non existent 
       for (org.jsoup.nodes.Element e: titles) { 
        System.out.println("Title: " + e.text()); 
        System.out.println("Link: " + e.attr("href")); 
       }

出於某種原因，我認爲我的計劃是無法找到titletext，因爲這是在輸出時的代碼運行：Titles:

我將非常感謝您的幫助，謝謝。

來源

2016-08-23 Stephane Hatgis-Kessell

嘗試doc.select（「span.titletext」）; – tonakai

是否有理由不使用更容易解析[RSS提要]（https://news.google.com/news?output=rss）？ –

首先獲得與H2的HTML標記

Elements elem = html.select("h2");

開始現在你必須元素，它具有一些子元素（S）（ID，HREF，originalhref等）的所有節點/元素。這裏您需要檢索您需要的這些數據。

for(Element e: elem){ 
     System.out.println(e.select("[class=titletext]").text()); 
     System.out.println(e.select("a").attr("href")); 
    }

來源

2016-08-29 19:39:30 Attila

Java - 如何使用Jsoup提取Google新聞標題和鏈接？

回答

相關問題