2013-07-05 129 views
2

我必須分析zip文件來檢查內容的大小,然而ZipEntry.getSize()會一直返回-1。這是根據規範,如果原始大小是未知的,但由於某種原因,7-zip似乎知道實際大小,因爲它顯示如果我打開它的拉鍊。獲取zip條目的實際大小

有沒有人知道7-zip是如何做到這一點的?它只是估計?

+0

也許7-ZIP擴展RAM中的每個條目找到膨脹的大小。 –

+0

儘管可能,它打開大拉鍊的速度使得這個值得懷疑 – nablex

回答

2

也許ZipEntry只處理本地文件頭,而不是在壓縮完成後寫入zip歸檔文件末尾的中央目錄,並且應該包含實際的文件大小信息。

+1

這似乎是準確的,我寫了一些代碼來解析中央目錄和正確的大小。謝謝! – nablex

0

對於那些有興趣的人,這裏是我用來解析拉鍊的代碼(請記住拉鍊是小端)。我用維基百科(http://en.wikipedia.org/wiki/ZIP_%28file_format%29)作爲結構的參考。

public static List<ZipCentralFileHeader> getCentralDirectory(File file) throws IOException { 
    List<ZipCentralFileHeader> entries = new ArrayList<ZipCentralFileHeader>(); 
    FileInputStream input = new FileInputStream(file); 
    try { 
     // only check the last 10 meg, make sure this is large enough depending on your data 
     long sizeToSkip = Math.max(0, file.length() - (1024 * 1024 * 10)); 
     if (sizeToSkip > 0) 
      input.skip(sizeToSkip); 
     byte [] buffer = new byte[(int) (file.length() - sizeToSkip)]; 
     int read = input.read(buffer); 
     if (read != buffer.length) 
      throw new IOException("Could not read the necessary data"); 
     for (int i = 0; i < buffer.length - 4; i++) { 
      if (buffer[i] == 0x50 && buffer[i + 1] == 0x4b && buffer[i + 2] == 0x01 && buffer[i + 3] == 0x02) { 
       Date lastModified = dosToJavaTime(get32(buffer, i + 12)); 
       long compressedSize = get32(buffer, i + 20); 
       long uncompressedSize = get32(buffer, i + 24); 
       int nameLength = get16(buffer, i + 28); 
       int extraFieldLength = get16(buffer, i + 30); 
       int commentLength = get16(buffer, i + 32); 

       String fileName = new String(Arrays.copyOfRange(buffer, i + 46, i + 46 + nameLength), "UTF-8"); 
       String comment = new String(Arrays.copyOfRange(buffer, i + 46 + nameLength + extraFieldLength, i + 46 + nameLength + extraFieldLength + commentLength), "UTF-8"); 

       entries.add(new ZipCentralFileHeader(fileName, lastModified, compressedSize, uncompressedSize, comment)); 
      } 
      // the end of the central directory 
      else if (buffer[i] == 0x50 && buffer[i + 1] == 0x4b && buffer[i + 2] == 0x05 && buffer[i + 3] == 0x06) { //0x06054b50 
       // each header starts the same, there is no general start sequence for the entire central directory 
       // as such you can't really be sure you got them all unless you scan the entire file 
       // the trailing section however contains the necessary information to validate the amount 
       int amountOfFileHeaders = get16(buffer, i + 8); 
       if (amountOfFileHeaders != entries.size()) 
        throw new IOException("Could only read " + entries.size() + "/" + amountOfFileHeaders + " headers for " + file + ", you likely did not read enough of the file"); 
       break; 
      } 
     } 
    } 
    finally { 
     input.close(); 
    } 
    return entries; 
} 

的實用方法get16,get32,get64和dosToJavaTime是基於JDK 7的快照從現有的ZipEntry代碼副本:

private static final int get16(byte b[], int off) { 
    return (b[off] & 0xff) | ((b[off+1] & 0xff) << 8); 
} 

private static final long get32(byte b[], int off) { 
    return (get16(b, off) | ((long)get16(b, off+2) << 16)) & 0xffffffffL; 
} 

private static final long get64(byte b[], int off) { 
    return get32(b, off) | (get32(b, off+4) << 32); 
} 

@SuppressWarnings("deprecation") 
private static Date dosToJavaTime(long dtime) { 
    Date date = new Date((int)(((dtime >> 25) & 0x7f) + 80), 
         (int)(((dtime >> 21) & 0x0f) - 1), 
         (int)((dtime >> 16) & 0x1f), 
         (int)((dtime >> 11) & 0x1f), 
         (int)((dtime >> 5) & 0x3f), 
         (int)((dtime << 1) & 0x3e)); 
    return date; 
} 
+0

實際上,zip文件包含CentralDirectoryLocator記錄,該記錄指向中央目錄的開頭和其中的條目數。所以你應該首先從最後跳過一些字節(高達千字節左右),然後爲中央目錄定位器設置密碼,然後尋找中央目錄的開始並讀取它。 –

相關問題