使用dom4j處理壓縮的XML文檔

具體來說，我使用dom4j來讀取KML文檔並解析出XML中的一些數據。當我只是通過在URL字符串形式給讀者，它是如此簡單和同時處理的文件系統的網址和網站網址：使用dom4j處理壓縮的XML文檔

SAXReader reader = new SAXReader(); 
Document document = reader.read(url);

問題是，有時我的代碼將需要處理的KMZ文件，這些文件基本上只是壓縮了XML（KML）文檔。不幸的是，用SAXReader沒有方便的方法來處理這個問題。我發現了各種各樣的時髦解決方案，以確定是否給定的文件是ZIP文件，但我的代碼很快變得臃腫和討厭 - 讀取流，構建文件，在開始時檢查「魔術」十六進制字節，提取等。

有沒有一些快速和乾淨的方式來處理這個？一種更簡單的方式連接到任何URL並提取內容（如果它們是壓縮的），否則只需抓取XML？

來源

2013-01-15 Bal

嗯，它似乎並沒有KMZDOMLoader處理網上的kmz文件。有可能kmz是動態加載的，所以它不會總是具有a）文件引用或b）.kmz擴展名 - 它必須根據內容類型來確定。

我最終做的是構建一個URL對象，然後獲取協議。我有單獨的邏輯來處理Web上的本地文件或文檔。然後在每個邏輯塊內，我必須確定它是否被壓縮。 SAXReader的read（）方法需要一個輸入流，所以我發現我可以使用ZipInputStream作爲kmzs。

這是我結束了與代碼：

private static final long ZIP_MAGIC_NUMBERS = 0x504B0304; 
private static final String KMZ_CONTENT_TYPE = "application/vnd.google-earth.kmz"; 

private Document getDocument(String urlString) throws IOException, DocumentException, URISyntaxException { 
     InputStream inputStream = null; 
     URL url = new URL(urlString); 
     String protocol = url.getProtocol(); 

     /* 
     * Figure out how to get the XML from the URL -- there are 4 possibilities: 
     * 
     * 1) a KML (uncompressed) doc on the filesystem 
     * 2) a KMZ (compressed) doc on the filesystem 
     * 3) a KML (uncompressed) doc on the web 
     * 4) a KMZ (compressed) doc on the web 
     */ 
     if (protocol.equalsIgnoreCase("file")) { 
      // the provided input URL points to a file on a file system 
      File file = new File(url.toURI()); 
      RandomAccessFile raf = new RandomAccessFile(file, "r"); 
      long n = raf.readInt(); 
      raf.close(); 

      if (n == KmlMetadataExtractorAdaptor.ZIP_MAGIC_NUMBERS) { 
       // the file is a KMZ file 
       inputStream = new ZipInputStream(new FileInputStream(file)); 
       ((ZipInputStream) inputStream).getNextEntry(); 
      } else { 
       // the file is a KML file 
       inputStream = new FileInputStream(file); 
      } 

     } else if (protocol.equalsIgnoreCase("http") || protocol.equalsIgnoreCase("https")) { 
      // the provided input URL points to a web location 
      HttpURLConnection connection = (HttpURLConnection) url.openConnection(); 
      connection.connect(); 

      String contentType = connection.getContentType(); 

      if (contentType.contains(KmlMetadataExtractorAdaptor.KMZ_CONTENT_TYPE)) { 
       // the target resource is KMZ 
       inputStream = new ZipInputStream(connection.getInputStream()); 
       ((ZipInputStream) inputStream).getNextEntry(); 
      } else { 
       // the target resource is KML 
       inputStream = connection.getInputStream(); 
      } 

     } 

     Document document = new SAXReader().read(inputStream); 
     inputStream.close(); 

     return document; 
    }

來源

2013-01-16 14:27:20 Bal

使用dom4j處理壓縮的XML文檔

回答

相關問題