2014-01-25 116 views
-1

我是jsoup和這個解析thingy的新手,所以如果你需要更多的信息讓你能夠回答我的問題,請告訴我!Jsoup表解析

我有這張表,我想用Java中的Jsoup解析。我只是想獲得的文本:

「BS計算機科學,CS(2012-2014)」

從表

<h3>Fahran S Kamili (fsk226)</h3> 
     <div> 
      10 Degree Audit Requests Returned. 
     </div> 
     <table> 
      <thead> 
       <tr> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 
         <th colspan="8">Degree Audits Requested</th> 

<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 

       </tr> 
       <tr> 
        <th>Rerun</th> 

<!-- *nrfkh - 9/2012: [degaudt-634]* --> 

<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
        <th>Request Created</th> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 

<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
        <th>Audit Type</th> 
        <th>Program</th> 
        <th>Courses Requested</th> 
        <th>Request Status</th> 
        <th>Audit ID</th> 
        <th>Delete Option</th> 
       </tr> 
      </thead> 
        <tbody><tr> 
         <td> 
            <a href="https://utdirect.utexas.edu/apps/degree/audits/requests/student_individual/?form-0-eid=fsk226&form-0-name=Fahran%20S%20Kamili&form-0-begin_ccyy=2012&form-0-degree_plan=ESC%20SS%20CS&form-0-minor=&current=X&future=&planned=&form-TOTAL_FORMS=20&form-INITIAL_FORMS=0&form-MAX_NUM_FORMS=&rerun=" target="_blank">Rerun</a> 
         </td> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 
<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
         <td> 
          12/20/2013 
          05:06 PM 
         </td> 
<!-- *nrfkh - 9/2012: [degaudt-634]* --> 
<!-- *end nrfkh - 9/2012: [degaudt-634]* --> 
         <td> 
           Normal 

         </td> 
         <td> 
          B S Computer Science, CS 
          (2012-2014) 
         </td> 
的這部分

表實際上是延伸到了長,但這些包含只是彼此的兄弟姐妹(所以我假設如果我能得到這個文本,我也可以很容易地得到其他文本)。

+2

'「所以如果你需要更多的信息.​​.....」「 - 是的,就像你到目前爲止嘗試過什麼,以及它如何不工作?還有什麼讓你特別困惑? –

回答

0

如果我是你的HTML部分保存到一個文件,並通過jsoup解析它,我會嘗試打印自認爲遇到的所有td元素是你所追求的:

public static void main(String... args) throws IOException { 
     File input = new File("C:/users/XYZ/desktop/input.html"); 
     Document doc = Jsoup.parse(input, "UTF-8", ""); 
     Elements tds = doc.getElementsByTag("td"); 
     for (Element td : tds) { 
      System.out.println(td.text()); 
     } 
    } 

輸出:

Rerun 
12/20/2013 05:06 PM 
Normal 
B S Computer Science, CS (2012-2014)