2
我想解析Tika服務器的xhtml輸出。該XML的InputStream(即我通過一個Apache的HttpClient獲得)聲明瞭一個命名空間,但不聲明DTD,根的樣子:爲xhtml解析實體缺乏與Java SAX的doctype聲明
<html xmlns="http://www.w3.org/1999/xhtml">
如果我嘗試解析使用SAX輸入流我碰上錯誤如果XML流包含我試圖迫使解析器使用XHTML 1.1 DTD
class XhtmlResolver implements EntityResolver {
public InputSource resolveEntity(String publicId, String systemId) {
InputStream in = getClass().getResourceAsStream("src/main/java/com/w3c/xhtml/xhtml11.dtd");
return new InputSource(in);
}
}
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
XMLReader reader = saxParser.getXMLReader();
reader.setEntityResolver(new XhtmlResolver());
reader.parse(new InputSource(inputStream));
的本地副本實體
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 47; columnNumber: 37; The entity "rsquo" was referenced, but not declared.
,但它仍然沒有解決實體。我仍然在任何有一個實體的xhtml流上發生SAXParseException。有人可以幫我從這裏出去嗎?
謝謝!