解析壓縮輸入流中包含的多個壓縮json文件，而不將任何文件保存到磁盤（因爲谷歌應用程序引擎）

我想解析包含在一個zip文件中的多個gzip壓縮文件，通過InputStream從一個壓縮文件http連接。解析壓縮輸入流中包含的多個壓縮json文件，而不將任何文件保存到磁盤（因爲谷歌應用程序引擎）

我已經設法讀取第一個文件，但沒有更多。有時會失敗，不會讀取整個（第一個）文件。我已經檢查了連接上的內容長度標題，即使在我無法讀取整個文件時也是如此。

我使用的goole應用程序引擎，它不允許我在本地保存文件，而我發現的大多數示例都在這樣做。

我對Zip文件使用https://commons.apache.org/proper/commons-compress/的ZipArchiveInputStream。

這是最密切相關的問題，我已經能夠找到：How to read from file containing multiple GzipStreams

private static ArrayList<RawEvent> parseAmplitudeEventArchiveData(HttpURLConnection connection) 
     throws IOException, ParseException { 
    String name, line; 
    ArrayList<RawEvent> events = new ArrayList<>(); 

    try (ZipArchiveInputStream zipInput = 
       new ZipArchiveInputStream(connection.getInputStream(), null, false, true);) { 

     ZipArchiveEntry zipEntry = zipInput.getNextZipEntry(); 
     if (zipEntry != null) { 

      try(GZIPInputStream gzipInputStream = new GZIPInputStream(connection.getInputStream()); 
      BufferedReader reader = new BufferedReader(new InputStreamReader(gzipInputStream))) { 

       name = zipEntry.getName(); 
       log.info("Parsing file: " + name); 

       while ((line = reader.readLine()) != null) { 
        events.add(parseJsonLine(line)); 
       } 
       log.info("Events size: " + events.size()); 
      } 
     } 
    } 
    return events; 
}

來源

2016-03-26 Jitan

我不知道這是如何工作，因爲你使用的輸入流來自GZIPInputStream的連接。但是你真正想要的是讀取ZipArchiveInputStream的數據並從這些數據創建一個GZIPInputStream。 –

@MartinKrüger是的我一直在想，如果我把它切換出來，你建議我得到一個「IOException：截斷的ZIP文件」 – Jitan

這個工作對我來說：

public class UnzipZippedFiles { 

    public static void main(String[] args) throws IOException, ParseException { 
     FileInputStream inputStream = new FileInputStream("/home/me/dev/scratchpad/src/main/resources/files.zip"); 
     unzipFile(inputStream); 
    } 

    private static void unzipFile(InputStream inputStream) 
      throws IOException, ParseException { 
     try (ZipArchiveInputStream zipInput = 
        new ZipArchiveInputStream(inputStream, null, false, true);) { 

      ZipArchiveEntry zipEntry; 

      while ((zipEntry = zipInput.getNextZipEntry()) != null) { 
       System.out.println("File: " + zipEntry.getName()); 

       byte[] fileBytes = readDataFromZipStream(zipInput, zipEntry); 

       ByteArrayInputStream byteIn = new ByteArrayInputStream(fileBytes); 
       unzipGzipArchiveAndPrint(byteIn); 
      } 
     } 
    } 

    private static byte[] readDataFromZipStream(ZipArchiveInputStream zipStream, ZipArchiveEntry entry) throws IOException { 
     byte[] data = new byte[(int) entry.getSize()]; 
     zipStream.read(data); 

     return data; 
    } 

    private static void unzipGzipArchiveAndPrint(InputStream inputStream) throws IOException { 
     System.out.println("Content:"); 
     try (GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream); 
      BufferedReader reader = new BufferedReader(new InputStreamReader(gzipInputStream))) { 

      String line; 
      while ((line = reader.readLine()) != null) { 
       System.out.println(line); 
      } 
     } 
    } 
}

來源

2016-03-26 18:24:06

問題我得到這個是entry.getSize返回-1。我想有一些關於zip文件的東西可以使它像這樣，但它可以用終端上的'unzip'命令解壓縮它。 – Jitan

什麼是ZipArchiveEntry？ –

解析壓縮輸入流中包含的多個壓縮json文件，而不將任何文件保存到磁盤（因爲谷歌應用程序引擎）

回答

相關問題