2013-04-16 139 views
1

我正在嘗試使用Jsoup從諾基亞開發人員網站http://www.developer.nokia.com/Devices/Device_specifications/Nokia_Asha_308/提取移動規範數據。 如何獲取每個子類別的數據,例如'相機功能','圖像格式'等。分別。使用Jsoup提取HTML數據

import java.io.IOException; 
import java.sql.SQLException; 
import org.jsoup.Jsoup; 
import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element; 
import org.jsoup.select.Elements; 

public class Nokiareviews { 
public static void main(String[] args) throws IOException, SQLException, InterruptedException { 
Document doc = Jsoup.connect("http://www.developer.nokia.com/Devices/Device_specifications/Nokia_Asha_308/").timeout(1000*1000).get(); 
Elements content = doc.select("div#accordeonContainer"); 
for (Element spec : content) { 
System.out.println(spec.text()); 
} 
} 

} 

回答

3

如果你仔細看,你會看到每個類別是class=accordeonContainer一個<div>,它的標題是在h2(即下)和子類別列表下<dl>"clearfix" CSS類:

使用
<div class="accordeonContainer accordeonExpanded"> 
    <h2 class=" accordeonTitle "><span>Multimedia</span></h2> 
    <div class="accordeonContent" id="Multimedia" style="display: block;"> 
     <dl class="clearfix"> 
      <dt>Camera Resolution</dt> 
      <dd>1600 x 1200 pixels </dd> 
       ...  
      <dt>Graphic Formats</dt> 
      <dd>BMP, DCF, EXIF, GIF87a, GIF89a, JPEG, PNG, WBMP </dd> 
      ... 
     </dl> 
    </div> 
</div> 

正如你可以挑選某一類型的元素列表(比如elm)給定的CSS類和(比如說clazz):

Elements elms = doc.select("elm.clazz"); 

然後,總之,一個代碼以提取你提到的消息,可能是線的東西:

public class Nokiareviews { 
    public static void main(String[] args) throws IOException { 
     Document doc = Jsoup.connect("http://www.developer.nokia.com/Devices/Device_specifications/Nokia_Asha_308/") 
       .timeout(1000 * 1000).get(); 
     Elements content = doc.select("div.accordeonContainer"); 
     for (Element spec : content) { 
      Elements h2 = spec.select("h2.accordeonTitle"); 
      System.out.println(h2.text()); 

      Elements dl = spec.select("dl.clearfix"); 
      Elements dts = dl.select("dt"); 
      Elements dds = dl.select("dd"); 

      Iterator<Element> dtsIterator = dts.iterator(); 
      Iterator<Element> ddsIterator = dds.iterator(); 
      while (dtsIterator.hasNext() && ddsIterator.hasNext()) { 
       Element dt = dtsIterator.next(); 
       Element dd = ddsIterator.next(); 
       System.out.println("\t\t" + dt.text() + "\t\t" + dd.text()); 
      } 
     } 
    } 
} 

如果使用maven,請確保您添加到您的pom.xml

<dependency> 
    <groupId>org.jsoup</groupId> 
    <artifactId>jsoup</artifactId> 
    <version>1.7.2</version> 
</dependency>