2011-12-12 56 views
1

允許使用羅馬API來解析供稿我收到此錯誤的RSS:雖然在羅馬獲得內容解析RSS源未在序言

com.sun.syndication.io.ParsingFeedException: Invalid XML 
    at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:210) 

的代碼如下:

public static void main(String[] args) { 
    URL url; 
    XmlReader reader = null; 
    SyndFeed feed; 

    try { 
     url = new URL("https://www.democracynow.org/podcast.xml"); 
     reader = new XmlReader(url); 
     feed = new SyndFeedInput().build(reader); 
     for (Iterator<SyndEntry> i =feed.getEntries().iterator(); i.hasNext();) { 
      SyndEntry entry = i.next(); 
      System.out.println(entry.getPublishedDate()+" Title "+entry.getTitle()); 

     } 
    } 
    catch (Exception e) { 
     e.printStackTrace(); 
    } 
} 

我查了一些環節,如:

http://old.nabble.com/Invalid-XML:-Error-on-line-1:-Content-is-not-allowed-in-prolog.-td21258868.html

凡proble米大概是字符集,但我想不出一個辦法來實現這一點。 任何幫助或指導將非常感激。

感謝和問候,

Vaibhav的哥斯瓦米

+0

並可以解析此URL。與RSS相比,我認爲雅加達Feed解析器可以處理更多類型的Feed。 – vaibhav

回答

1

我使用的整合以及和我能夠得到出版日期和標題。

我的代碼如下:

URL feedUrl = new URL("http://www.bloomberg.com/tvradio/podcast/cat_markets.xml"); 

SyndFeedInput input = new SyndFeedInput(); 
SyndFeed feed = input.build(new XmlReader(feedUrl)); 

for (Iterator i = feed.getEntries().iterator(); i.hasNext();) 
{ 
SyndEntry entry = (SyndEntry) i.next(); 
System.out.println("title |"+entry.getTitle()+" " -timeStamp "+entry.getPublishedDate()"\n") 
} 

這工作,我已經使用彭博網址只是導致它給了我一個XML。

如果您的查詢是別的東西,不要讓我知道:)

0

可以使用SyndFeedSyndEntry用於解析XML

你也需要檢查XML是否爲有效一個

URL url = new URL("http://feeds.feedburner.com/javatipsfeed"); 
    XmlReader reader = null; 
    try { 
     reader = new XmlReader(url); 
     SyndFeed feeder = new SyndFeedInput().build(reader); 
     System.out.println("Feed Title: "+ feeder.getAuthor()); 
     for (Iterator i = feeder.getEntries().iterator(); i.hasNext();) { 
     SyndEntry syndEntry = (SyndEntry) i.next(); 
     System.out.println(syndEntry.getTitle()); 
     } 
     } finally { 
      if (reader != null) 
       reader.close(); 
     } 
0

這是由於Byte Order Mark problem。這裏是一個演示該問題及更正JUnit測試案例:我試圖通過實施飼料雅加達解析器我的功能

package rss; 

import org.xml.sax.InputSource; 

import java.io.*; 
import java.net.*; 

import com.sun.syndication.io.*; 

import org.apache.commons.io.IOUtils; 
import org.apache.commons.io.input.BOMInputStream; 
import org.junit.Test; 

public class RssEncodingTest { 

    String url = "http://www.moneydj.com/KMDJ/RssCenter.aspx?svc=NH&fno=1&arg=X0000000"; 

    // This works because we use InputSource direct from the UrlConnection's InputStream 

    @Test 
    public void test01() throws MalformedURLException, IOException, 
      IllegalArgumentException, FeedException { 
     try (InputStream is = new URL(url).openConnection().getInputStream()) { 
      InputSource source = new InputSource(is); 
      System.out.println("description: " 
        + new SyndFeedInput().build(source).getDescription()); 
     } 
    } 

    // But a String input fails because the byte order mark problem 

    @Test 
    public void test02() throws MalformedURLException, IOException, 
      IllegalArgumentException, FeedException { 
     String html = IOUtils.toString(new URL(url).openConnection() 
       .getInputStream()); 
     Reader reader = new StringReader(html); 
     System.out.println("description: " 
       + new SyndFeedInput().build(reader).getDescription()); 
    } 

    // We can use Apache Commons IO to fix the byte order mark 

    @Test 
    public void test03() throws MalformedURLException, IOException, 
      IllegalArgumentException, FeedException { 
     String html = IOUtils.toString(new URL(url).openConnection() 
       .getInputStream()); 
     try (BOMInputStream bomIn = new BOMInputStream(
       IOUtils.toInputStream(html))) { 
      String f = IOUtils.toString(bomIn); 
      Reader reader = new StringReader(f); 
      System.out.println("description: " 
        + new SyndFeedInput().build(reader).getDescription()); 
     } 
    } 

}