2015-08-20 26 views
0

我必須通過jsoup解析一個頁面。該頁面有一個類和各種元素的標籤,如p,h1,h2, h3等我想分析它們,然後處理它們中的每一個。該頁面的樣子:如何遍歷jsoup中的各種元素?

<div class="pf-content"> 
     <p>For centuries, Spain shone and progressed under Muslim rule. Unfortunately, the city of Seville fell prey to the barbaric onslaught of the Kingdom of Castile in the year 1248. Several innocent Spaniards were killed, many were forced to leave their homeland and seek refuge elsewhere, whereas many others were captured and taken as slaves. The rulers of Castile further destroyed remnants of Islamic life and culture, <a href="https://muslimmemo.com/masjids-spain/">including masjids</a>.</p> 
     <h3>Original Arabic Text</h3> 
     <h4>Original Arabic Text</h4> 
    </div> 

其中p,H3,H4等出現確實重要,因爲我必須把它解析到Android TextView的序列。

什麼我可以做的是:

Document document = Jsoup.connect("page link here").get(); 

Elements pTag = document.select("div.pf-content"); 

但是我應該如何從這裏出發?請幫幫我。

我想的是:

Elements elements = document.select("div.pf-content"); 

      for (Element element : elements) { 
       Log.d("FullContent", "elements are: " + element); 
       if (element.select("p").first() != null) { 
        Log.d("FullContent", "a p tag"); 
        if (element.select("p").first().select("img").first() != null) { 
         Log.d("FullContent", "the tag " + "has src"); 
        } 


       } else if (element.select("h1").first() != null) { 
        Log.d("FullContent", "a h1 tag"); 
       } else if (element.select("h2").first() != null) { 
        Log.d("FullContent", "a h2 tag"); 
       } else if (element.select("h3").first() != null) { 
        Log.d("FullContent", "a h3 tag"); 
       } else if (element.select("h4").first() != null) { 
        Log.d("FullContent", "a h4 tag"); 
       } else { 
        Log.d("FullContent", "other tag"); 
       } 

      } 

回答

1

一旦你有你發現Elements pTag = document.select("div.pf-content");Elements,你可以做到以下幾點:

Elements elements = pTag.first().children(); for (Element e : elements){ // Do something with each element }

+0

有編輯一起來看看。請告訴我以上不起作用。任何相同的教程。 http://jsoup.org/apidocs/org/jsoup/select/Elements.html工作不正常。 – learner

+0

嘗試'元素元素= document.getElementsByClass(「pf-content」);'雖然這會給你一個所述類的元素列表。你必須得到這些元素之一(例如拳頭),並調用'children()'來獲取div標籤中的元素。 – GSala

+0

謝謝你的幫助。你能給我一個jsoup教程的鏈接嗎? – learner