Java讀取XML - 停在'<'特殊字符

我正在製作一個練習應用程序，目標是從RSS提要讀取數據。Java讀取XML - 停在'<'特殊字符

到目前爲止，它已經很好，除了我的應用程序遇到特殊字符的問題。它讀取節點中的第一個特殊字符，然後移動到下一個節點。

任何幫助將不勝感激，並抱歉後面的大代碼塊。

RSS源 - www.usu.co.nz/usu-news/rss.xml

<title>Unitec hosts American film students</title> 
<link>http://www.usu.co.nz/node/4640</link> 
<description>&lt;p&gt;If you’ve been hearing American accents around the Mt Albert campus over the past week.</description>

顯示代碼

String xml = XMLFunctions.getXML(); 
Document doc = XMLFunctions.XMLfromString(xml); 

NodeList nodes = doc.getElementsByTagName("item"); 

for (int i = 0; i < nodes.getLength(); i++) 
{       
    Element e = (Element)nodes.item(i); 
    Log.v("XMLTest", XMLFunctions.getValue(e, "title")); 
    Log.v("XMLTest", XMLFunctions.getValue(e, "link")); 
    Log.v("XMLTest", XMLFunctions.getValue(e, "description")); 
    Log.v("XMLTest", XMLFunctions.getValue(e, "pubDate")); 
    Log.v("XMLTest", XMLFunctions.getValue(e, "dc:creator")); 
}

閱讀器代碼

public class XMLFunctions 
{ 

public final static Document XMLfromString(String xml) 
{ 

    Document doc = null; 

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
    try { 

     DocumentBuilder db = dbf.newDocumentBuilder(); 

     InputSource is = new InputSource(); 
     is.setCharacterStream(new StringReader(xml)); 
     doc = db.parse(is); 

    } catch (ParserConfigurationException e) { 
     System.out.println("XML parse error: " + e.getMessage()); 
     return null; 
    } catch (SAXException e) { 
     System.out.println("Wrong XML file structure: " + e.getMessage()); 
     return null; 
    } catch (IOException e) { 
     System.out.println("I/O exeption: " + e.getMessage()); 
     return null; 
    } 

    return doc; 

} 

/** Returns element value 
    * @param elem element (it is XML tag) 
    * @return Element value otherwise empty String 
    */ 
public final static String getElementValue(Node elem) { 
    Node kid; 
    if(elem != null) 
    { 
     if (elem.hasChildNodes()) 
     { 
      for(kid = elem.getFirstChild(); kid != null; kid = kid.getNextSibling()) 
      { 
       if(kid.getNodeType() == Node.TEXT_NODE ) 
       { 
        return kid.getNodeValue(); 
       } 
      } 
     } 
    } 
    return ""; 
} 

public static String getXML(){ 
     String line = null; 

     try { 

      DefaultHttpClient httpClient = new DefaultHttpClient(); 
      HttpPost httpPost = new HttpPost("http://www.usu.co.nz/usu-news/rss.xml"); 

      HttpResponse httpResponse = httpClient.execute(httpPost); 
      HttpEntity httpEntity = httpResponse.getEntity(); 
      line = EntityUtils.toString(httpEntity); 

     } catch (UnsupportedEncodingException e) { 
      line = "<results status=\"error\"><msg>Can't connect to server</msg></results>"; 
     } catch (MalformedURLException e) { 
      line = "<results status=\"error\"><msg>Can't connect to server</msg></results>"; 
     } catch (IOException e) { 
      line = "<results status=\"error\"><msg>Can't connect to server</msg></results>"; 
     } 

     return line; 

} 

public static int numResults(Document doc){  
    Node results = doc.getDocumentElement(); 
    int res = -1; 

    try{ 
     res = Integer.valueOf(results.getAttributes().getNamedItem("count").getNodeValue()); 
    }catch(Exception e){ 
     res = -1; 
    } 

    return res; 
} 

public static String getValue(Element item, String str) {  
    NodeList n = item.getElementsByTagName(str);   
    return XMLFunctions.getElementValue(n.item(0)); 
} 
}

輸出

Unitec hosts American film students 
http://www.usu.co.nz/node/4640 
< 
Wed, 01 Aug 2012 05:43:22 +0000 
Phillipa

來源

2012-08-06 Aelexe

這不是一個答案，但您是否考慮過使用更高級別的XML API來讀取這些項目？像Apache [XMLBeans]（http://xmlbeans.apache.org/）這樣的庫使得將XML解析爲便捷的Java對象變得非常容易。當談到「有趣」的角色和其他古怪時，他們也經過了充分的測試。 – 2012-08-06 09:29:55

你的功能

public final static String getElementValue(Node elem) { 
    Node kid; 
    if(elem != null) 
    { 
     if (elem.hasChildNodes()) 
     { 
      for(kid = elem.getFirstChild(); kid != null; kid = kid.getNextSibling()) 
      { 
       if(kid.getNodeType() == Node.TEXT_NODE ) 
       { 
        return kid.getNodeValue(); 
       } 
      } 
     } 
    } 
    return ""; 
}

被返回給定元素下的第一個文本節點。單個標籤中的大量文本可以分割爲多個文本節點，並且這種情況往往會在特殊字符出現時發生。

您應該將所有文本節點附加到返回值的字符串中。

大約的東西像這可能工作：

public final static String getElementValue(Node elem) { 
    if ((elem == null) || (!(elem.hasChildNodes()))) 
     return ""; 

    Node kid; 
    StringBuilder builder = new StringBuilder(); 
    for(kid = elem.getFirstChild(); kid != null; kid = kid.getNextSibling()) 
    { 
     if(kid.getNodeType() == Node.TEXT_NODE ) 
     { 
      builder.append(kid.getNodeValue()); 
     } 
    } 
    return builder.toString(); 
}

來源

2012-08-06 09:52:47

請不要在使用StringBuilder時使用StringBuffer。 – 2012-08-13 08:12:04

非常正確。我已經修改它使用StringBuilder。 – 2012-08-13 09:37:40

<?xml version="1.0" encoding="UTF-8"?>似乎不見了。也沒有根元素。

來源

2012-08-06 09:26:58 Chris

我假設我們在這裏看到XML的一個片段。請注意，它不包含貢獻者「'Phillipa'」，但是這在輸出中被引用。 – 2012-08-06 09:27:52

對不起，應該澄清。我只是試圖展示XML的一小部分，這樣您就可以看到其具有問題的特殊字符。 – Aelexe 2012-08-06 09:30:06

您確定XML字符串未被DefaultHttpClient轉換嗎？我想你的代碼，並改變了方法XMLFunctions.getXML（）喂XML字符串，而不是直接由DefaultHttpClient得到它的，輸出是一樣

Unitec hosts American film students 
http://www.usu.co.nz/node/4640 
<p>If you’ve been hearing American accents around the Mt Albert campus over the past week.

預期。

來源

2012-08-06 09:44:57 moody

你的代碼只提取從元素第一子文本節點。 DOM規範允許多個相鄰的文本節點，所以我懷疑這裏發生的事情是您的解析器代表<,p,>，其餘文本爲（至少）四個單獨的文本節點。您需要將節點連接成一個字符串，或者在包含元素節點上調用normalize()（它修改DOM樹以將相鄰文本節點合併爲一個）。

有各種圖書館可以幫助你。例如，如果您的應用程序使用Spring框架，則org.springframework.util.xml.DomUtils具有一個getTextValue靜態方法，該方法將從元素中提取完整的文本值。

來源

2012-08-06 09:55:03

+1：這些可能是比我發佈的更好的解決方案。 – 2012-08-06 10:06:10

稍有偏離主題，但您可能想查看一個已有的RSS框架，如ROME。比重新發明輪子更好。

來源

2012-08-06 10:37:07 pap

Java讀取XML - 停在'<'特殊字符

回答

相關問題