2014-04-18 120 views
1

我需要解析的XML文檔,這與以下幾行開始:在Java中解析XML文件時如何避免讀取DTD?

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd"> 

<pdf2xml producer="poppler" version="0.22.0"> 
<page number="1" position="absolute" top="0" left="0" height="1263" width="892"> 
    <fontspec id="0" size="12" family="Times" color="#000000"/> 

我使用下面的代碼閱讀:

final DocumentBuilder builder; 
    DocumentBuilderFactory builderFactory = 
      DocumentBuilderFactory.newInstance(); 

    builder = builderFactory.newDocumentBuilder(); 

    Document document = builder.parse(
      new FileInputStream(aXmlFileName)); 

最後調用失敗,以下情況除外:

Exception in thread "main" java.io.FileNotFoundException: D:\dev\ro-2014-04-13-01\pdf2xml.dtd 
    at java.io.FileInputStream.open(Native Method) 
    at java.io.FileInputStream.<init>(FileInputStream.java:146) 
    at java.io.FileInputStream.<init>(FileInputStream.java:101) 
    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90) 
    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188) 
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:613) 

文件pdf2xml.dtd實際上不存在於指定的目錄中。

我該如何修改代碼,以便即使沒有pdf2xml.dtd也可以解析文檔?

+2

你需要實現一個EntityResolver。看到這裏:http://stackoverflow.com/questions/155101/make-documentbuilder-parse-ignore-dtd-references – Wintermute

回答

3

您需要使用Entity Resolver

myBuilder.setEntityResolver(new EntityResolver() { 
    @Override 
    public InputSource resolveEntity(String publicId, String systemId) 
      throws SAXException, IOException { 
     if (systemId.contains("pdf2xml.dtd")) { 
      return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes())); 
     } else 
      return null; 
    } 
}); 

當解析器達到條件 - 「pdf2xml.dtd」,實體解析器被調用,它返回一個空的XML文檔。