2015-02-11 188 views
2

我想解析11384 XML文件到一個SQLite數據庫。其中之一:SAX解析器。如何阻止SAX解析器連接到Internet?

<?xml version="1.0" encoding="UTF-8"?> 
<!-- 
Copyright (C) 2009/2010/2011 Ulrich Apel. 
This work is distributed under the conditions of the Creative Commons 
Attribution-Share Alike 3.0 Licence. This means you are free: 
* to Share - to copy, distribute and transmit the work 
* to Remix - to adapt the work 

Under the following conditions: 
* Attribution. You must attribute the work by stating your use of KanjiVG in 
    your own copyright header and linking to KanjiVG's website 
    (http://kanjivg.tagaini.net) 
* Share Alike. If you alter, transform, or build upon this work, you may 
    distribute the resulting work only under the same or similar license to this 
    one. 

See http://creativecommons.org/licenses/by-sa/3.0/ for more details. 
--> 
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd" [ 
<!ATTLIST g 
xmlns:kvg CDATA #FIXED "http://kanjivg.tagaini.net" 
kvg:element CDATA #IMPLIED 
kvg:variant CDATA #IMPLIED 
kvg:partial CDATA #IMPLIED 
kvg:original CDATA #IMPLIED 
kvg:part CDATA #IMPLIED 
kvg:number CDATA #IMPLIED 
kvg:tradForm CDATA #IMPLIED 
kvg:radicalForm CDATA #IMPLIED 
kvg:position CDATA #IMPLIED 
kvg:radical CDATA #IMPLIED 
kvg:phon CDATA #IMPLIED > 
<!ATTLIST path 
xmlns:kvg CDATA #FIXED "http://kanjivg.tagaini.net" 
kvg:type CDATA #IMPLIED > 
]> 
<svg xmlns="http://www.w3.org/2000/svg" width="109" height="109" viewBox="0 0 109 109"> 
<g id="kvg:StrokePaths_0ff01" style="fill:none;stroke:#000000;stroke-width:3;stroke-linecap:round;stroke-linejoin:round;"> 
<g id="kvg:0ff01"> 
    <path id="kvg:0ff01-s1" d="M54.5,15.79c0,6.07-0.29,55.49-0.29,60.55"/> 
    <path id="kvg:0ff01-s2" d="M54.5,88 c -0.83,0 -1.5,0.67 -1.5,1.5 0,0.83 0.67,1.5 1.5,1.5 0.83,0 1.5,-0.67 1.5,-1.5 0,-0.83 -0.67,-1.5 -1.5,-1.5"/> 
</g> 
</g> 
<g id="kvg:StrokeNumbers_0ff01" style="font-size:8;fill:#808080"> 
    <text transform="matrix(1 0 0 1 45 16)">1</text> 
    <text transform="matrix(1 0 0 1 45 88)">2</text> 
</g> 
</svg> 

我使用SAX解析器:

public class SaxKanjivgHandler extends DefaultHandler { 
..... 
     File folder = new File(KANJIVG_DIRECTORY); 
     if (folder.isDirectory()) { 
      File[] listOfFiles = folder.listFiles(); 

      for (File file : listOfFiles) { 
       if (file.isFile()) { 
        currentFileName = file.getName(); 
        readXmlFromFile(file); 
       } 
      } 
     } 
..... 
    public void readXmlFromFile(File file) throws ParserConfigurationException, 
      SAXException, IOException { 

     SAXParserFactory factory = SAXParserFactory.newInstance(); 
     SAXParser parser = factory.newSAXParser(); 
     parser.parse(file, this); 

    } 

當我解析的文件,我收到此錯誤:

Exception in thread "main" java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at sun.net.www.MeteredStream.read(Unknown Source) at java.io.FilterInputStream.read(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source) at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipSpaces(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanEntityDecl(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanDecls(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanDTDExternalSubset(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(Unknown Source) at SaxKanjivgHandler.readXmlFromFile(SaxKanjivgHandler.java:63) at SaxKanjivgHandler.(SaxKanjivgHandler.java:44) at Main.main(Main.java:28)

首先,我認爲這錯誤是由於一個確切的文件。但是不同時間的不同文件會發生錯誤。如何使SAX解析器停止連接到Internet?

+3

我只是猜測,但它可能是你沒有[關閉DTD驗證](https://stackoverflow.com/questions/1185519/how-to-read-well-formed-xml-in- java-but-skip-the-schema) – hd1 2015-02-11 17:46:08

回答

1

您可以提供自己的EntityResolver

public class DummyEntityResolver implements EntityResolver { 
    public InputSource resolveEntity(String publicID, String systemID) 
     throws SAXException { 

     return new InputSource(new StringReader("")); 
    } 
} 

public void readXmlFromFile(File file) throws ParserConfigurationException, 
     SAXException, IOException { 

    SAXParserFactory factory = SAXParserFactory.newInstance(); 
    SAXParser parser = factory.newSAXParser(); 
    parser.getXMLReader().setEntityResolver(new DummyEntityResolver()); 
    parser.parse(file, this); 

} 

這將阻止外部實體解析。如果你有一些外部實體,你想提供,你可以檢查publicIDsystemID

HTH。

+1

另外你可能會注意到,Saxon有一個EntityResolver,net.sf.saxon.lib.StandardEntityResolver,它知道最常見的W3C DTD和外部實體文件,並將它們重定向到本地副本保存在Saxon JAR文件中。 – 2015-02-11 22:01:34

+0

Thx爲提示。沒有意識到這一點。 – mp911de 2015-02-12 07:06:46

+0

以上解決方案不適用於解析器。使用XMLReader .... XMLReader reader = XMLReaderFactory.createXMLReader(); reader.setEntityResolver(new DtdResolver()); – user001 2015-09-01 09:55:53