2014-02-25 87 views
1

我想使用jsoup從下面的HTML代碼中提取以下< td>標籤,其中包含class css-sched-table-title和css-sched-waypoint。但是我無法理解有人可以幫助哪裏出錯?無法在java中使用jsoup從html中提取內容?

Document doc = Jsoup.parse("somelink.html"); 
    Elements row = doc.select(".css-sched-table-title td"); 
    Iterator<Element> iterator = row.listIterator(); 
    while(iterator.hasNext()) 
    { 
     Element element = iterator.next(); 
     String value = element.text(); 
     System.out.println("value : " + value); 
    } 

<tr> 
     <td ALIGN="CENTER" COLSPAN="16" CLASS="css-sched-table-title"><b>Saturday - </b><b>Afternoon</b></td> 
    </tr> 
    <tr VALIGN="BOTTOM"> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Townline and Southern</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and Blueridge</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and South Fraser</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Ar. Bourquin Exchange</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Lv. Bourquin Exchange</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Downtown Abbotsford</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">McMillan and Old Yale</TD> 
     <TD>&nbsp;</TD> 
     <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Sandy Hill and Old Clayburn</TD> 
    </tr> 
+0

你嘗試 「td.css-SCHED表標題」? – Nishant

+0

嗨Nishant沒有工作 –

回答

1

有一個td標籤與css-sched-table-titlecss-sched-waypoints列表。

此外,對齊到正確的語法應該是Elements row = doc.select("td.css-sched-waypoints");,請參閱here

注意:html文件原樣使用,jsoup不會將其解釋爲有效的表格html內容。我不得不將上面的內容附在<table></table>標籤內。

當我嘗試下面的代碼與html文件:

Elements row = doc.select("td.css-sched-waypoints"); 
    Element title = doc.select("td.css-sched-table-title").first(); 

    System.out.println(title.text()); 
    Iterator<Element> iterator = row.listIterator(); 
    while (iterator.hasNext()) { 
     Element element = iterator.next(); 
     String id = element.attr("id"); 
     String classes = element.attr("class"); 
     String value = element.text(); 
     System.out.println("Id : " + id + ", classes : " + classes 
       + ", value : " + value); 
    } 

我得到的,

Saturday - Afternoon 
Id : , classes : css-sched-waypoints, value : Townline and Southern 
Id : , classes : css-sched-waypoints, value : Clearbrook and Blueridge 
Id : , classes : css-sched-waypoints, value : Clearbrook and South Fraser 
Id : , classes : css-sched-waypoints, value : Ar. Bourquin Exchange 
Id : , classes : css-sched-waypoints, value : Lv. Bourquin Exchange 
Id : , classes : css-sched-waypoints, value : Downtown Abbotsford 
Id : , classes : css-sched-waypoints, value : McMillan and Old Yale 
Id : , classes : css-sched-waypoints, value : Sandy Hill and Old Clayburn 
+0

嗨PopoFibo感謝解釋我得到它糾正,現在它工作正常。 –

+0

嗨PopoFibo一個簡單的問題是可能的元素行= doc.select(「td.css-sched-waypoints」);元素時間= doc.select(「td.css-sched-times」);而不是有2個獨立的元素只是讓他們在一個元素實例? –

+0

@ dev_marshell08是的,你確定可以 - 開始參考這個問題http://stackoverflow.com/questions/21694216/selecting-elements-that-have-multiple-class-whilst-using-jsoup/21694612#21694612 – PopoFibo