2013-12-08 93 views
1

我嘗試從當天的NASA圖像讀取/解析RSS源。 這是下面的代碼。我得到了這樣的例外:我嘗試解析xml時出現無效字節

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. 
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.arrangeCapacity(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(Unknown Source) 
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(Unknown Source) 
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) 
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) 
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) 
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) 
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) 
at Start.processFeed(Start.java:30) 
at Loader.main(Loader.java:12) 

我在做什麼錯?

P.S.當然我有主要方法的另一類:)

在此先感謝。

import java.io.InputStream; 
import java.net.URL; 
import javax.xml.parsers.SAXParser; 
import javax.xml.parsers.SAXParserFactory; 

import org.xml.sax.Attributes; 
import org.xml.sax.InputSource; 
import org.xml.sax.SAXException; 
import org.xml.sax.XMLReader; 
import org.xml.sax.helpers.DefaultHandler; 


public class Start extends DefaultHandler { 

    private String url = "http://www.nasa.gov/rss/dyn/image_of_the_day.rss"; 
    private boolean inUrl = false; 
    private boolean inTitle = false; 
    private boolean inDescription = false; 
    private boolean inItem = false; 
    private boolean inDate = false; 

    public void processFeed() { 
      try { 
      SAXParserFactory factory = 
       SAXParserFactory.newInstance(); 
      SAXParser parser = factory.newSAXParser(); 
      XMLReader reader = parser.getXMLReader(); 
      reader.setContentHandler(this); 
      InputStream inputStream = new URL(url).openStream(); 
      reader.parse(new InputSource(inputStream)); 
     } catch(Exception e) { 
      e.printStackTrace(); 
     } 
    } // processFeed 


    @Override 
    public void startElement(String uri, String localName, String qName, 
     Attributes attributes) throws SAXException { 

    if(localName.startsWith("item")) { inItem = true; } 
    else if (inItem) { 
     if(localName.equals("title")) { inTitle = true; } 
     else { inTitle = false; } 

     if(localName.equals("description")) { inDescription = true; } 
     else { inDescription = false; } 

     if(localName.equals("pubDate")) { inDate = true; } 
     else { inDate = false; } 
    } 

} 


@Override 
public void characters(char[] ch, int start, int length) 
     throws SAXException { 
    String chars = new String(ch).substring(start, start + length); 

    if(inTitle) { System.out.println(chars); } 
    if(inDescription) { System.out.println(chars); } 
    if(inDate) { System.out.println(chars); } 
} 

}

回答

1

響應實體的gzip編碼(所以它的壓縮)!你可以用輸入流爲GZIPInputStream

InputStream inputStream = new GZIPInputStream(new URL(url).openStream()); 

您應該使用「長表」的讀經形成一個網址,讓你有通過連接更多的控制,可以測試,是否內容被壓縮。

URL url = new URL(urlString); 
HttpURLConnection con = (HttpURLConnection) url.openConnection(); 
// we're not really connected now. Just the connection object has been created 
// here you can set additional request properties (e.g. request headers) 
con.connect(); 
// now we are connected! 
if (con.getResponseCode() == HttpURLConnection.HTTP_OK) { 
    try (InputStream entityStream = con.getInputStream()) { 
     InputStream is; 
     if ("gzip".equals(con.getContentEncoding())) { 
      is = new GZIPInputStream(entityStream); // wrap 
     } else { 
      is = entityStream; 
     } 

     reader.parse(new InputSource(is)); 
    } 
} else { 
    // handle HTTP response code != OK 
} 
con.disconnect(); 
+0

我從來沒有想過壓縮,謝謝 – SuperManEver

+0

你可以給一些文章的鏈接或什麼我可以閱讀更多關於它? – SuperManEver

+0

這裏是關於URLConnections的一般信息:http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html。關於特殊的HTTP,您可以搜索HTTP請求和響應標頭以及響應代碼。 – isnot2bad