Java jdom xml解析

這是我第一次使用java，我嘗試爲我的網站構建一個小XML解析器，因此我可以在我的sitemaps.xml中看到一個乾淨的外觀。我使用的代碼就是這樣Java jdom xml解析

import java.io.IOException; 
import java.io.InputStream; 
import java.io.StringReader; 
import java.net.URL; 
import java.util.List; 


import org.jdom2.Element; 
import org.jdom2.JDOMException; 
import org.jdom2.input.SAXBuilder; 

class downloadxml { 
    public static void main(String[] args) throws IOException { 

     String str = "http://www.someurl.info/sitemap.xml"; 
     URL url = new URL(str); 
     InputStream is = url.openStream(); 
     int ptr = 0; 
     StringBuilder builder = new StringBuilder(); 
     while ((ptr = is.read()) != -1) { 
      builder.append((char) ptr); 
     } 
     String xml = builder.toString(); 

     org.jdom2.input.SAXBuilder saxBuilder = new SAXBuilder(); 
     try { 
      org.jdom2.Document doc = saxBuilder.build(new StringReader(xml)); 
      System.out.println(xml); 
      Element xmlfile = doc.getRootElement(); 
      System.out.println("ROOT -->"+xmlfile); 
      List list = xmlfile.getChildren("url"); 
      System.out.println("LIST -->"+list); 
     } catch (JDOMException e) { 
      // handle JDOMExceptio n 
     } catch (IOException e) { 
      // handle IOException 
     } 

     System.out.println("==========================="); 

    } 
}

當代碼通

System.out.println(xml);

我得到的XML網站地圖的清洗打印。當涉及到：

System.out.println("ROOT -->"+xmlfile);

輸出：

ROOT -->[Element: <urlset [Namespace: http://www.sitemaps.org/schemas/sitemap/0.9]/>]

它還發現的根元素。但出於某種原因或其他，當腳本應該去孩子的，它返回一個空的打印：

System.out.println("LIST -->"+list);

輸出：

LIST -->[]

我應該以另一種方式嗎？任何指針來獲取孩子？

的XML看起來像這樣

<?xml version="1.0" encoding="UTF-8"?> 
      <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
      xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"> 
       <url> 
        <loc>http://www.image.url</loc> 
        <image:image> 
        <image:loc>http://www.image.url/image.jpg</image:loc> 
        </image:image> 
        <changefreq>daily</changefreq> 
       </url> 
       <url> 
      </urlset>

來源

2013-05-31 Johnny000

你進來了一天很長的路要走。

簡而言之，您忽略了XML文檔的名稱空間。更改行：

List list = xmlfile.getChildren("url");

到

Namespace ns = Namespace.getNamespace("http://www.sitemaps.org/schemas/sitemap/0.9"); 
List list = xmlfile.getChildren("url", ns);

爲了您的方便，你可能還需要簡化整個構建過程：

org.jdom2.Document doc = saxBuilder.build("http://www.someurl.info/sitemap.xml");

來源

2013-05-31 01:18:25 rolfl

謝謝，現在工作;第二個建議非常好！ – Johnny000

不客氣，你應該多讀一下關於命名空間的內容，以及如何在JDOM中處理它們，http://www.jdom.org/docs/faq.html#a0260 – rolfl

我的評論是與上述類似，但是使用catch子句，當輸入xml不是「格式良好」時，它顯示出好的消息。這裏的輸入是一個xml文件。

File file = new File("adr781.xml"); 
SAXBuilder builder = new SAXBuilder(false); 
    try { 
     Document doc = builder.build(file); 
     Element root = doc.getRootElement(); 
    } catch (JDOMException e) { 
     say(file.getName() + " is not well-formed."); 
     say(e.getMessage()); 
    } catch (IOException e) { 
     say("Could not check " + file.getAbsolutePath()); 
     say(" because " + e.getMessage()); 
    }

來源

2013-07-12 18:55:24

Java jdom xml解析

回答

相關問題