將.tar.gz文件複製到文件夾

我想將.tar.gz文件的內容複製到2個文件夾，它具有大約20個文件，並且總解壓縮大小將> 20 GB。
我爲此使用了Truezip。將.tar.gz文件複製到文件夾

TFile archive = new TFile(absoluteZipName); // archive with .tar.gz 
    TFile[] archFiles = archive.listFiles(); // takes too much time 
    for (TFile t : archFiles) { 
     String fileName = t.getName(); 
      if(fileName.endsWith(".dat")) 
       t.cp(new File(destination1+ t.getName())); 
      else if(fileName.endsWith(".txt")){ 
       t.cp(new File(destination2+ t.getName())); 
      } 
    } 
It takes 3 times above tar xzf command (untar linux) . Have any way to optimize this code for fast copying, memory not an issue. 

    The following code allows fast copying Thanks npe for the good advice. 
    (NB: I have no previledge to post the answe now that's why editing question itself) 

InputStream is = new FileInputStream(absoluteZipName); 
      ArchiveInputStream input = new ArchiveStreamFactory() 
       .createArchiveInputStream(ArchiveStreamFactory.TAR, new GZIPInputStream(is)); 

      ArchiveEntry entry; 
      while ((entry = input.getNextEntry()) != null) { 
       OutputStream outputFileStream=null; 
       if(entry.getName().endsWith(".dat")){ 
       File outFile1= new File(destination1, entry.getName()); 
        outputFileStream = new FileOutputStream(outFile1); 
       } 
       else if(entry.getName().endsWith(".txt")){ 
       File outFile2= new File(destination2, entry.getName()); 
        outputFileStream = new FileOutputStream(outFile2); 
       } 
       // use ArchiveEntry#getName() to do the conditional stuff... 
       IOUtils.copy(input, outputFileStream,10485760); 
      } 


    Is threading In file copy will reduce time..? In TZip didn't reduced as they already threading it. anyway I will try tomorrow and will let you Know.

來源

2012-06-11 Abdul

分兩步做？ tar -xzvf ./some/tmp/destination + 2倍'mv'命令過濾擴展名？或者它是強制性的在Java中？ – hovanessyan

它在java中是強制性的 – Abdul

在這種情況下，我認爲你不能優化那麼多。 – hovanessyan

感謝npe，這是我做過的最後一件事，任何方式都比tar xzf花費的時間少。最終的代碼片段就像這樣。

InputStream is = new FileInputStream(absoluteZipName); 
ArchiveInputStream input = new ArchiveStreamFactory() 
    .createArchiveInputStream(ArchiveStreamFactory.TAR, new GZIPInputStream(is)); 

ArchiveEntry entry; 
while ((entry = input.getNextEntry()) != null) { 
    OutputStream outputFileStream=null; 
    if(entry.getName().endsWith(".dat")){ 
    File outFile1= new File(destination1, entry.getName()); 
     outputFileStream = new FileOutputStream(outFile1); 
    } 
    else if(entry.getName().endsWith(".txt")){ 
    File outFile2= new File(destination2, entry.getName()); 
     outputFileStream = new FileOutputStream(outFile2); 
    } 
    // use ArchiveEntry#getName() to do the conditional stuff... 
    IOUtils.copy(input, outputFileStream,10485760); 
}

希望我可以做一些更多的優化，稍後會做。非常感謝

來源

2012-06-12 03:38:16 Abdul

看來，listFiles()你解壓縮文件gzip爲了能夠通過tar文件掃描到讓所有的文件名，然後cp(File, File)掃描一遍，以給定的文件流位置。

我會做的是使用Apache Commons Compress做的inputstreams一類迭代器掃描，排序是這樣的：

InputStream is = new FileInputStream("/path/to/my/file"); 
ArchiveInputStream input = new ArchiveStreamFactory() 
    .createArchiveInputStream(ArchiveStreamFactory.TAR, new GZIPInputStream(is)); 

ArchiveEntry entry; 
while ((entry = input.getNextEntry()) != null) { 

    // use ArchiveEntry#getName() to do the conditional stuff... 

}

閱讀的javadoc ArchiveInputStream#getNextEntry()和ArchiveEntry獲取更多信息。

來源

2012-06-11 14:11:08 npe

我先試了這個......我不需要先解壓，然後解壓。也是平時的兩倍。 – Abdul

編輯答案，給出一個示例如何通過'GZIPInputStream'流式傳輸並直接將此流傳遞給'commons-compress'。這樣一切都應該是一個一步的過程。您也可以使用['GzipCompressorInputStream']（http://commons.apache.org/compress/apidocs/src-html/org/apache/commons/compress/compressors/gzip/GzipCompressorInputStream.html#line.47）這，而不是JDK的實現。 – npe

，看起來不錯，我創建了焦油和untarred現在它看起來像焦油xzf感謝輸入相同的時間。我會在下一篇文章中提出修改後的代碼 – Abdul

您目睹的性能問題的原因是TAR文件格式缺少中央目錄。但是由於TrueZIP是一個虛擬文件系統，它無法預測客戶端應用程序的訪問模式，因此必須在首次訪問時將整個TAR文件解壓縮到臨時目錄。這是TFile.listFiles（）上發生的情況。然後，將條目從臨時目錄複製到目標目錄。因此，所有的每個入口字節將被讀取或寫入四次。

爲了獲得最佳性能，你有兩個選擇：

（一）你可以切換到ZIP文件格式，並與TrueZIP文件* API堅持。 ZIP文件有一箇中央目錄，因此閱讀它們不涉及創建臨時文件。（b）您可以將TAR.GZ文件作爲流處理，如npe所示。然後我會將它與java.util.zip.GZIPInputStream結合使用，因爲該實現基於快速C代碼。我也會使用TrueZIP的Streams.copy（InputStream，OuputStream）方法，因爲它會使用多線程進行真正快速的批量複製。

來源

2012-06-12 02:15:26

將.tar.gz文件複製到文件夾

回答

相關問題