1
我需要解析的XML文檔,這與以下幾行開始:在Java中解析XML文件時如何避免讀取DTD?
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">
<pdf2xml producer="poppler" version="0.22.0">
<page number="1" position="absolute" top="0" left="0" height="1263" width="892">
<fontspec id="0" size="12" family="Times" color="#000000"/>
我使用下面的代碼閱讀:
final DocumentBuilder builder;
DocumentBuilderFactory builderFactory =
DocumentBuilderFactory.newInstance();
builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(
new FileInputStream(aXmlFileName));
最後調用失敗,以下情況除外:
Exception in thread "main" java.io.FileNotFoundException: D:\dev\ro-2014-04-13-01\pdf2xml.dtd
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:613)
文件pdf2xml.dtd
實際上不存在於指定的目錄中。
我該如何修改代碼,以便即使沒有pdf2xml.dtd
也可以解析文檔?
你需要實現一個EntityResolver。看到這裏:http://stackoverflow.com/questions/155101/make-documentbuilder-parse-ignore-dtd-references – Wintermute