2012-03-12 73 views
1

在我的項目中,我需要解析XML。 XML中的一些項目具有HTML標籤。我試圖刪除這些標籤,但我沒有成功。在活動的代碼是:Android,解析XML,如何忽略HTML標籤?

private NewsFeedItemList parseNewsContent() { 
     NewsParserHandler newsParserHandler = null; 

     Log.i("NewsList", "Starting to parse XML..."); 

     try { 
      SAXParserFactory factory = SAXParserFactory.newInstance(); 
      SAXParser parser = factory.newSAXParser(); 
      XMLReader xr = parser.getXMLReader(); 
      newsParserHandler = new NewsParserHandler(); 
      xr.setContentHandler(newsParserHandler); 

      ByteArrayInputStream is = new ByteArrayInputStream(strServerResponseMsg.getBytes()); 
      xr.parse(new InputSource(is)); 

     } catch (ParserConfigurationException e) { 
      e.printStackTrace(); 
     } catch (SAXException e) { 
      e.printStackTrace(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } 

     NewsFeedItemList itemList = newsParserHandler.getNewsList(); 
//  checkLog(itemList); 

     Log.i("NewsList", "Parsing XML finished. Sending result back to caller..."); 
     return itemList; 
    } 

「strServerResponseMsg」包括的XML信息(http://www.mania.com.my/rss/ManiaTopStoriesFeedFull.aspx?catid=146

我凸輪解析所有項目,但那些誰擁有HTML標記不會解析,徹底。

這是我的解析器處理程序:

public class NewsParserHandler extends DefaultHandler { 

    private NewsFeedItemList newsFeedItemList; 
    private boolean current = false; 
    private String currentValue = null; 

    /* Because the feed has another "Title", "link" and "pubdate" name in root, 
    * we need to don't let to be stored in arrays. Therefore, we ignore all of 
    * them by incrementing count.*/ 
    private int count = 0; 


    @Override 
    public void characters(char[] ch, int start, int length) throws SAXException { 
     super.characters(ch, start, length); 

     if(current) { 
      currentValue = new String(ch, start, length); 

      if(currentValue==null || currentValue=="" || currentValue==" ") 
       currentValue = "-"; 

      current = false; 
     } 
    } 

    @Override 
    public void startDocument() throws SAXException { 
     super.startDocument(); 

     newsFeedItemList = new NewsFeedItemList(); 
    } 

    @Override 
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { 
     super.startElement(uri, localName, qName, attributes); 

     current = true; 
    } 

    @Override 
    public void endElement(String uri, String localName, String qName) throws SAXException { 
     super.endElement(uri, localName, qName); 

     current = false; 

     if(localName.equals("title")) { 
      if(count >= 1) 
       newsFeedItemList.setTitle(currentValue); 
     } 
     if(localName.equals("description")) { 
      newsFeedItemList.setDescription(currentValue); 
     } 
     if(localName.equals("fullbody")) { 
      newsFeedItemList.setFullbody(currentValue); 
     } 
     if(localName.equals("link")) { 
      if(count >= 4) 
       newsFeedItemList.setLink(currentValue); 
     } 
     if(localName.equals("pubDate")) { 
      if(count >= 5) 
       newsFeedItemList.setPubDate(currentValue); 
     } 
     if(localName.equals("image")) { 
      newsFeedItemList.setImage(currentValue); 
     } 

     count++; 
    } 

    @Override 
    public void endDocument() throws SAXException { 
     super.endDocument(); 
    } 


    public NewsFeedItemList getNewsList() { 
     return newsFeedItemList; 
    } 

} 

我試圖把字符currentValue = Html.fromHtml(currentValue).toString();()方法,但沒有生效。在發送「strServerResponseMsg」之前,我試圖將其更改爲HTML,但解析器未解析任何內容。

,我發現這些問題,但他們的解決方案並沒有爲我工作: How to strip or escape html tags in Android Display HTML Formatted String

我感激這麼多,如果你能幫助我。謝謝。

回答

0

使用下面的方法從currentValue變量中刪除所有HTML標記。

public static String removeHtmlTag(String htmlString) { 
     return htmlString.replaceAll("\\<.*?\\>", "").trim(); 
} 
+0

謝謝Lalit,但不幸的是它不起作用。我不知道爲什麼它是這樣的:( – Hesam 2012-03-13 01:37:43