2016-06-10 69 views
0

我正在使用一個工具,而且我在最後一個步驟中,但是我遇到了一個小問題,將不勝感激,您可以給我一個提示。 我有這3個表格,我只能從第2個數據中獲取數據,我如何才能達到寫入的第三個數據升級保修和服務信息?從HTML中的第三個表格獲取數據

下面是表代碼:

<body> 
 
\t \t <div id="ibm-pcon"> 
 
\t \t \t <div id="ibm-content"> 
 
\t \t \t \t <div id="ibm-leadspace-head" class="ibm-alternate"> 
 
\t \t \t \t \t <div id="ibm-leadspace-body"> 
 
\t \t \t \t \t \t <br></br> 
 
\t \t \t \t \t \t <script type="text/javascript">currentDate();</script> 
 
\t \t \t \t \t \t <br></br> 
 
\t \t \t \t \t \t 
 
\t \t \t \t \t \t \t <!--BEGIN OPTIONAL BREADCRUMBING--> <span style="font-size: small;"><a href="/pc/entitle/pg2/Service.wss/display/MachineHome">Machine Lookup</a> &gt; <a href="/pc/entitle/pg2/Service.wss/mts/Lookup">Warranty Information</a> &gt; </span> 
 
\t \t \t \t \t \t \t <!--END OPTIONAL BREADCRUMBING--> 
 
\t \t \t \t \t \t 
 
\t \t \t \t \t \t <br></br> 
 
\t \t \t \t \t \t <h1>PEW | Warranty Information</h1> \t \t \t \t 
 
\t \t \t \t \t </div> 
 
\t \t \t \t </div> 
 
\t \t \t \t <!-- CONTENT_BODY --> 
 
\t \t \t \t <div id="ibm-content-body"> 
 
\t \t \t \t \t <div id="ibm-content-main"> 
 
\t \t \t \t \t <!-- LEADSPACE_BEGIN --> \t \t \t \t 
 
\t \t \t \t \t \t \t \t 
 
\t \t \t \t \t \t 
 
\t \t <!-- This section can be used to test JavaScript and CSS before promoting the data to the template XML. --> 
 
\t \t <table class="ibm-results-table" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody xmlns="http://www.w3.org/TR/xhtml1/"> 
 
<thead> 
 
<tr> 
 
<th scope="col" class="pg2OutputTableSectionTitle">Results of Machine Type/Serial Number Query</th> 
 
</tr> 
 
</thead> 
 
<tr> 
 
<td><table class="ibm-data-table ibm-alternating" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody> 
 
<thead> 
 
<tr> 
 
<th scope="col" colspan="3" class="pg2TableSectionTitle">General Machine Information:</th> 
 
</tr> 
 
</thead> 
 
<tr> 
 
<td> 
 
        Type: 
 
        <span>1746</span> 
 
</td><td> 
 
        Model: 
 
        <span>C4A</span> 
 
</td><td> 
 
        Serial: 
 
        <span>13D06MK</span> 
 
</td> 
 
</tr> 
 
<tr> 
 
<td> 
 
        Status: 
 
        <span>Proof Of Purchase Rcvd</span> 
 
</td><td> 
 
         Build Date: 
 
         <span>&nbsp;</span> 
 
</td><td> 
 
         Build to Model: 
 
         <span> </span> 
 
</td> 
 
</tr> 
 
<tr> 
 
<td> 
 
         Geography: 
 
         <span>EMEA</span> 
 
</td><td> 
 
         Country: 
 
         <span>GREECE</span> 
 
</td><td> 
 
         Configuration Id: 
 
         <span>&nbsp;</span> 
 
</td> 
 
</tr> 
 
<tr> 
 
<td> 
 
         OES Order Number: 
 
         <span>2076804957</span> 
 
</td><td> 
 
         Customer Number: 
 
         <span>108401</span> 
 
</td><td> 
 
         Delivery Number: 
 
         <span>8519501492</span> 
 
</td> 
 
</tr> 
 
<tr> 
 
<td colspan="2"> 
 
            Service Status: 
 
            <span>This machine is currently out of warranty.</span> 
 
</td><td colspan="1"> 
 
            UAR End Date: 
 
            <span>2012-08-02</span> 
 
</td> 
 
</tr> 
 
</tbody></table></td> 
 
</tr> 
 
<tr> 
 
<td><table class="ibm-data-table ibm-alternating" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody> 
 
<thead> 
 
<tr> 
 
<th scope="col" colspan="3" class="pg2TableSectionTitle">Warranty and Service Information:</th> 
 
</tr> 
 
</thead> 
 
<tr> 
 
<th scope="col">Start Date</th><th scope="col">End Date</th><th scope="col">SDF</th> 
 
</tr> 
 
<tr> 
 
<td>2012-07-04</td><td>2015-07-03</td><td>3XL</td> 
 
</tr> 
 
<tr> 
 
<td colspan="3"> 
 
        SDF Description: 
 
        <span>This product has a 3 year limited warranty and is entitled to CRU (customer replaceable unit) and On-site service. Tier 1 CRUs are customer responsibility, see announcement for details. On-site Service is available Monday - Friday, except holidays, with a next business day response objective.</span> 
 
</td> 
 
</tr> 
 
</tbody></table></td> 
 
</tr> 
 
<tr> 
 
<td><table class="ibm-data-table ibm-alternating" summary="output table" cellpadding="0" cellspacing="0" border="0"><tbody> 
 
<thead> 
 
<tr> 
 
<th scope="col" colspan="3" class="pg2TableSectionTitle">Upgrade Warranty and Service Information:</th> 
 
</tr> 
 
</thead> 
 
<tr> 
 
<th scope="col">Start Date</th><th scope="col">End Date</th><th scope="col">SDF</th> 
 
</tr> 
 
<tr> 
 
<td>2012-07-04</td><td>2015-07-03</td><td>SP4</td> 
 
</tr> 
 
<tr> 
 
<td colspan="3"> 
 
        SDF Description: 
 
        <span>This product has a three year limited warranty which includes a warranty upgrade. This product is entitled to parts and labor and includes on-site repair service. Service is available 7X24 with an 4 hour response objective.</span> 
 
</td> 
 
</tr> 
 
</tbody></table></td> 
 
</tr> 
 
<tr> 
 
<td><table class="ibm-data-table" cellpadding="0" cellspacing="0" border="0"><thead> 
 
<tr> 
 
<th scope="col" class="pg2MessageHead">Messages</th> 
 
</tr> 
 
</thead> 
 
<tbody> 
 
<tr> 
 
<td class="pg2MessagePanel" align="left">&nbsp;</td> 
 
</tr> 
 
</tbody></table></td> 
 
</tr> 
 
</tbody></table> 
 
\t \t 
 
\t \t \t \t \t </div>

我的工作代碼爲:

  public void actionPerformed(ActionEvent e) {     
       try { 
        String getTextArea; 
        getTextArea = textArea.getText(); 
        String[] arr = getTextArea.split("\\n"); 
        String type = null; 
        String serial = null; 
        int line = 0; 
        for(String s : arr) { 

         line++; 
         if(s.isEmpty()) { 
          textArea_1.append("Empty Line" + '\n'); 
          continue; 
         } 

         type = s.substring(0, 4); 
         serial = s.substring(5, 12); 
         String html = "bla bla bla + type + serial; 

         Document doc = Jsoup.connect(html).get(); 
         Elements tableElements = doc.select("table"); 
         java.util.Iterator<Element> ite = tableElements.select("tr").iterator(); 
         Elements tableElement = doc.select("tr"); 
         java.util.Iterator<Element> ite1 = tableElement.select("table").iterator(); 
         ite.next(); 
         ite1.next(); 

         String result,result1,result2; 
         result = ite.next().text(); 
         result1 = ite1.next().text(); 

         Scanner sr = new Scanner(result); 
         Scanner sr1 = new Scanner(result1); 

//      System.out.println(result); 
//      System.out.println(result1); 

         // result of first table 
         while(sr.hasNext()) { 
          result = result; 
          ite.next().text(); 
          String lineOfType; 
          lineOfType = ite.next().text(); 
          type = lineOfType.substring(6, 10); 
          String model; 
          model = lineOfType.substring(18, 21); 
          serial = lineOfType.substring(30, 37); 
          ite.next().text(); 
          String country = ite.next().text(); 
          country = country.substring(24, 31); 
          textArea_1.append(line + "-" + type + '\t' + model + '\t' + serial + " " + country + " "); 
         } 

         sr.close(); 

         // result of secind table 

         while(sr1.hasNext()) { 
          result1 = result1; 
          String startDate = result1.substring(58, 68); 
          String endDate = result1.substring(69, 79); 
          textArea_1.append(startDate + " " + endDate + " "); 
          break; 
         } 

         sr1.close(); 

         // getting the elements for the 3rd table, but not working as expected, it gets the secnd table data. 

         Elements tableElement2 = doc.select("tr"); 
         java.util.Iterator<Element> ite2 = tableElement2.select("table").iterator(); 
         ite2.next(); 
         result2 = ite2.next().text(); 
         Scanner sr2 = new Scanner(result2); 


         // this while shows the same result as the second while ! 
         while(sr2.hasNext()) { 
          sr2.next(); 
          result2 = result2; 
          System.out.println(result2); 
          String srvPkStart = result2.substring(58, 68); 
          if(srvPkStart.equals(result1.substring(58, 68))) { 
           srvPkStart = "Not found"; 
          } 
          String srvPkEnd = result2.substring(69, 79); 
          if(srvPkEnd.equals(result1.substring(69, 79))) { 
           srvPkEnd = ""; 
          } 
          System.out.println(srvPkStart + '\t' + srvPkEnd); 
          textArea_1.append("ServicePack Dates: " + srvPkStart + '\t' + srvPkEnd + '\n'); 
          break; 
         } 



        } // end of for loop  
       } catch (Exception e2) { 
        // TODO: handle exception 
       } 
      } 
     }); 

回答

1

讓我們說改變另一個更簡單的方式來獲得這些表。我建議按照課程使用org.jsoup.nodes.Element.select()來獲得表格。

結算此link瞭解如何使用jsoup-selector-syntax獲取元素。

String html = "<body><div id=\"ibm-pcon\"><div id=\"ibm-content\"><div id=\"ibm-leadspace-head\" class=\"ibm-alternate\"><div id=\"ibm-leadspace-body\"><br></br><script type=\"text/javascript\">currentDate();</script><br></br><!--BEGIN OPTIONAL BREADCRUMBING--> <span style=\"font-size: small;\"><a href=\"/pc/entitle/pg2/Service.wss/display/MachineHome\">Machine Lookup</a> &gt; <a href=\"/pc/entitle/pg2/Service.wss/mts/Lookup\">Warranty Information</a> &gt; </span><!--END OPTIONAL BREADCRUMBING--><br></br><h1>PEW | Warranty Information</h1> </div></div><!-- CONTENT_BODY --><div id=\"ibm-content-body\"><div id=\"ibm-content-main\"><table class=\"ibm-results-table\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"><tbody xmlns=\"www.w3.org/TR/xhtml1/\"><thead> <tr><th scope=\"col\" class=\"pg2OutputTableSectionTitle\">Results of Machine Type/Serial Number Query</th> </tr></thead><tr> <td><table class=\"ibm-data-table ibm-alternating\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <tbody> <thead><tr> <th scope=\"col\" colspan=\"3\" class=\"pg2TableSectionTitle\">General Machine Information:</th></tr> </thead> <tr><td> Type: <span>1746</span></td><td> Model: <span>C4A</span></td><td> Serial: <span>13D06MK</span></td> </tr> <tr><td> Status: <span>Proof Of Purchase Rcvd</span></td><td> Build Date: <span>&nbsp;</span></td><td> Build to Model: <span> </span></td> </tr> <tr><td> Geography: <span>EMEA</span></td><td> Country: <span>GREECE</span></td><td> Configuration Id: <span>&nbsp;</span></td> </tr> <tr><td> OES Order Number: <span>2076804957</span></td><td> Customer Number: <span>108401</span></td><td> Delivery Number: <span>8519501492</span></td> </tr> <tr><td colspan=\"2\"> Service Status: <span>This machine is currently out of warranty.</span></td><td colspan=\"1\"> UAR End Date: <span>2012-08-02</span></td> </tr> </tbody></table> </td></tr><tr> <td><table class=\"ibm-data-table ibm-alternating\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <tbody> <thead><tr> <th scope=\"col\" colspan=\"3\" class=\"pg2TableSectionTitle\">Warranty and Service Information:</th></tr> </thead> <tr><th scope=\"col\">Start Date</th><th scope=\"col\">End Date</th><th scope=\"col\">SDF</th> </tr> <tr><td>2012-07-04</td><td>2015-07-03</td><td>3XL</td> </tr> <tr><td colspan=\"3\"> SDF Description: <span>This product has a 3 year limited warranty and is entitled to CRU (customer replaceable unit) and On-site service. Tier 1 CRUs are customer responsibility, see announcement for details. On-site Service is available Monday - Friday, except holidays, with a next business day response objective.</span></td> </tr> </tbody></table> </td></tr><tr> <td><table class=\"ibm-data-table ibm-alternating\" summary=\"output table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <tbody> <thead><tr> <th scope=\"col\" colspan=\"3\" class=\"pg2TableSectionTitle\">Upgrade Warranty and Service Information:</th></tr> </thead> <tr><th scope=\"col\">Start Date</th><th scope=\"col\">End Date</th><th scope=\"col\">SDF</th> </tr> <tr><td>2012-07-04</td><td>2015-07-03</td><td>SP4</td> </tr> <tr><td colspan=\"3\"> SDF Description: <span>This product has a three year limited warranty which includes a warranty upgrade. This product is entitled to parts and labor and includes on-site repair service.Service is available 7X24 with an 4 hour response objective.</span></td> </tr> </tbody></table> </td></tr><tr> <td><table class=\"ibm-data-table\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\"> <thead><tr> <th scope=\"col\" class=\"pg2MessageHead\">Messages</th></tr> </thead> <tbody><tr> <td class=\"pg2MessagePanel\" align=\"left\">&nbsp;</td></tr> </tbody></table> </td></tr></tbody> </table></div> </body>"; 
    Document doc = Jsoup.parse(html, "", Parser.xmlParser()); 
    Elements tables = doc.select("table.ibm-data-table.ibm-alternating"); // Get table which has classes = ibm-data-table, ibm-alternating 

    System.out.println(tables.size()); // tables.size = 3 

    for (Element ele: tables) { 
     // Get table header 
     Elements thElements = ele.select("tr > th.pg2TableSectionTitle"); // Get tableheader has classes = pg2TableSectionTitle 

     if (thElements != null && thElements.size() > 0) { 
      String tableTitle = thElements.get(0).text(); 
      System.out.println(tableTitle); 

      if (tableTitle.contains("General Machine Information:")) { 
       // Apply your logic accordingly for table #General Machine 
      } 
      else if (tableTitle.contains("Warranty and Service Information:")) { 
       // Apply your logic accordingly for table #Warranty and Service 
      } 
      else if (tableTitle.contains("Upgrade Warranty and Service Information:")) { 
       // Apply your logic accordingly for table #Upgrade Warranty 
      } 
     } 
    } 
+0

這是一個不錯的主意,但不幸的是它不工作,因爲我的HTML是不是因爲你是顯示的,我必須首先將數據添加到我的工具提出了文本,然後按運行,以便爲每個數據都會給我預期的結果,現在當我嘗試你的代碼時,它給了我大小爲表大小,然後沒有其他打印! 我使用的網站是類似 http://w3-01.ibm.com/pc/entitle/pg2/Service.wss/mts/Lookup?type=12345&serial=123456789 –

+0

我試圖改變文檔,它的工作原理現在:D Document doc = Jsoup.connect(html).get(); –

+0

@AboelmagdSaad我使用的html正是您提供的。我假設你使用Jsoup.connect()來獲取HTML源代碼。只要返回的HTML源代碼與您在問題中提到的代碼相似,我的代碼/ Jsoup就可以工作。要檢查您獲得的HTML源代碼,請調用Document.html()。 –