NekoHTML SAX解析片段

我試圖解析HTML與NekoHTML一個簡單的片斷：NekoHTML SAX解析片段

<h1>This is a basic test</h1>

要做到這一點，我給自己定一個specific Neko feature不要有任何HTML，頭部或身體的標籤調用的startElement （..）回電話。

不幸的是，它不適合我..我當然錯過了一些東西，但無法弄清楚它會是什麼。

這是一個非常簡單的代碼來重現我的問題：

public static class MyContentHandler implements ContentHandler { 

    public void characters(char[] ch, int start, int length) throws SAXException { 
     String text = String.valueOf(ch, start, length); 
     System.out.println(text); 
    } 

    public void startElement(String nameSpaceURI, String localName, String rawName, Attributes attributes) throws SAXException { 
     System.out.println(rawName); 
    } 

    public void endElement(String nameSpaceURI, String localName, String rawName) throws SAXException { 
     System.out.println("end " + localName); 
    } 
}

和主（）啓動測試：

public static void main(String[] args) throws SAXException, IOException { 
     SAXParser saxReader = new SAXParser(); 
     // set the feature like explained in documentation : http://nekohtml.sourceforge.net/faq.html#fragments 
     saxReader.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true); 
     saxReader.setContentHandler(new MyContentHandler()); 
     saxReader.parse(new InputSource(new StringInputStream("<h1>This is a basic test</h1>"))); 
    }

相應的輸出：

HTML 
HEAD 
end HEAD 
BODY 
H1 
This is a basic test 
end H1 
end BODY 
end HTML

而我期待

H1 
This is a basic test 
end H1

有什麼想法？

來源

2011-09-03 Gael

如果您將此功能設置爲false，您是否得到完全相同的輸出？ –

是的，完全一樣:-( – Gael

我終於明白了！

實際上，我在GWT應用程序中解析了我的HTML字符串，在那裏我添加了gwt-dev.jar依賴項。這個jar打包了很多外部庫，比如xercesImpl。但是嵌入式xerces類的版本與NeokHTML所要求的版本不匹配。

作爲一個（奇怪的）結果，看起來NeokHTML SAX解析器在使用gwt-dev embedded xerces版本時沒有使用任何自定義功能。

因此，我不得不重新編寫一些代碼來刪除gwt-dev依賴項，順便提一下，不建議將它添加到任何標準的GWT項目中。

來源

2011-09-07 08:27:17 Gael

更具體地說，gwt-dev.jar包含版本1.9.13中的NekoHTML，它與片段解析相關，片段解析適用於1.9.11和1.9.14，沒有運氣： - （ – Gael

NekoHTML SAX解析片段

回答

相關問題