撒克遜解析緩慢

我想解析一些xml與撒克遜做一些XPath查詢它，但得到了2個問題：第一個是，撒克遜是非常長的在xhtml中建立一個非常短的文檔。代碼是這樣的：撒克遜解析緩慢

Processor processorInstance = new Processor(false); 
    processorInstance.setConfigurationProperty(FeatureKeys.DTD_VALIDATION, false); 


    XPathCompiler XPathCompilerInstance = processorInstance.newXPathCompiler(); 
    XPathCompilerInstance.setBackwardsCompatible(false); 

    String expressionTitre = "//div[@class='score_global']/preceding-sibling::img[1]"; 

    XPathExecutable XPathExecutableInstance = XPathCompilerInstance.compile(expressionTitre); 
    XPathSelector selector = XPathExecutableInstance.load(); 
    logger.info("Xpath compiled."); 

    // Phase 2, load xml document. 
    DocumentBuilder documentBuilderInstance = processorInstance.newDocumentBuilder(); 
    documentBuilderInstance.setSchemaValidator(null); 
    documentBuilderInstance.setLineNumbering(false); 
    documentBuilderInstance.setRetainPSVI(false); 


    XdmNode context = documentBuilderInstance.build(new File("sample/sample.xml")); // This line takes ages to return.

什麼我不明白的是，如果我用SAX做到這一點，它加載以正常速度:( 我忘記什麼撒克遜提供

的Java？ 1.6 撒克遜9.1.0.8

第二個問題是，他是無法處理重音的字符，而我的XML是這樣的：

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

所以我刪除了xml：lang en lang =屬性但沒有更好的運氣:(

你有什麼想法嗎？謝謝！

來源

2012-02-11 charly' s

那麼經過大量閱讀後，很有必要定義一個CatalogResolver並在本地下載Xhtml dtds。我放棄了撒克遜，並使用簡單的JaxP/SaxReader代替。

此頁http://xml.apache.org/commons/components/resolver/resolver-article.html證明非常有趣。

希望這種考慮將證明自己對某人有用:)

來源

2012-02-23 14:34:42

好吧，我發現雖然我配置了Saxon不能驗證，但他仍嘗試解析URI並且沒有設法在本地找到它，於是他上網並從W3c獲取了需要很長時間時間返回。我在我的XML中刪除了DTD聲明，它工作。我的下一步是讓它停下來試圖解決它。我目前正在閱讀撒克遜文檔和玩解決方案的實體，它應該沒問題。

來源

2012-02-13 22:06:00

請結合這與您的其他「答案」。沒有理由有他們兩個。謝謝。 – 2014-11-17 14:08:48

撒克遜解析緩慢

回答

相關問題