我必須分析zip文件來檢查內容的大小,然而ZipEntry.getSize()會一直返回-1。這是根據規範,如果原始大小是未知的,但由於某種原因,7-zip似乎知道實際大小,因爲它顯示如果我打開它的拉鍊。獲取zip條目的實際大小
有沒有人知道7-zip是如何做到這一點的?它只是估計?
我必須分析zip文件來檢查內容的大小,然而ZipEntry.getSize()會一直返回-1。這是根據規範,如果原始大小是未知的,但由於某種原因,7-zip似乎知道實際大小,因爲它顯示如果我打開它的拉鍊。獲取zip條目的實際大小
有沒有人知道7-zip是如何做到這一點的?它只是估計?
也許ZipEntry只處理本地文件頭,而不是在壓縮完成後寫入zip歸檔文件末尾的中央目錄,並且應該包含實際的文件大小信息。
這似乎是準確的,我寫了一些代碼來解析中央目錄和正確的大小。謝謝! – nablex
對於那些有興趣的人,這裏是我用來解析拉鍊的代碼(請記住拉鍊是小端)。我用維基百科(http://en.wikipedia.org/wiki/ZIP_%28file_format%29)作爲結構的參考。
public static List<ZipCentralFileHeader> getCentralDirectory(File file) throws IOException {
List<ZipCentralFileHeader> entries = new ArrayList<ZipCentralFileHeader>();
FileInputStream input = new FileInputStream(file);
try {
// only check the last 10 meg, make sure this is large enough depending on your data
long sizeToSkip = Math.max(0, file.length() - (1024 * 1024 * 10));
if (sizeToSkip > 0)
input.skip(sizeToSkip);
byte [] buffer = new byte[(int) (file.length() - sizeToSkip)];
int read = input.read(buffer);
if (read != buffer.length)
throw new IOException("Could not read the necessary data");
for (int i = 0; i < buffer.length - 4; i++) {
if (buffer[i] == 0x50 && buffer[i + 1] == 0x4b && buffer[i + 2] == 0x01 && buffer[i + 3] == 0x02) {
Date lastModified = dosToJavaTime(get32(buffer, i + 12));
long compressedSize = get32(buffer, i + 20);
long uncompressedSize = get32(buffer, i + 24);
int nameLength = get16(buffer, i + 28);
int extraFieldLength = get16(buffer, i + 30);
int commentLength = get16(buffer, i + 32);
String fileName = new String(Arrays.copyOfRange(buffer, i + 46, i + 46 + nameLength), "UTF-8");
String comment = new String(Arrays.copyOfRange(buffer, i + 46 + nameLength + extraFieldLength, i + 46 + nameLength + extraFieldLength + commentLength), "UTF-8");
entries.add(new ZipCentralFileHeader(fileName, lastModified, compressedSize, uncompressedSize, comment));
}
// the end of the central directory
else if (buffer[i] == 0x50 && buffer[i + 1] == 0x4b && buffer[i + 2] == 0x05 && buffer[i + 3] == 0x06) { //0x06054b50
// each header starts the same, there is no general start sequence for the entire central directory
// as such you can't really be sure you got them all unless you scan the entire file
// the trailing section however contains the necessary information to validate the amount
int amountOfFileHeaders = get16(buffer, i + 8);
if (amountOfFileHeaders != entries.size())
throw new IOException("Could only read " + entries.size() + "/" + amountOfFileHeaders + " headers for " + file + ", you likely did not read enough of the file");
break;
}
}
}
finally {
input.close();
}
return entries;
}
的實用方法get16,get32,get64和dosToJavaTime是基於JDK 7的快照從現有的ZipEntry代碼副本:
private static final int get16(byte b[], int off) {
return (b[off] & 0xff) | ((b[off+1] & 0xff) << 8);
}
private static final long get32(byte b[], int off) {
return (get16(b, off) | ((long)get16(b, off+2) << 16)) & 0xffffffffL;
}
private static final long get64(byte b[], int off) {
return get32(b, off) | (get32(b, off+4) << 32);
}
@SuppressWarnings("deprecation")
private static Date dosToJavaTime(long dtime) {
Date date = new Date((int)(((dtime >> 25) & 0x7f) + 80),
(int)(((dtime >> 21) & 0x0f) - 1),
(int)((dtime >> 16) & 0x1f),
(int)((dtime >> 11) & 0x1f),
(int)((dtime >> 5) & 0x3f),
(int)((dtime << 1) & 0x3e));
return date;
}
實際上,zip文件包含CentralDirectoryLocator記錄,該記錄指向中央目錄的開頭和其中的條目數。所以你應該首先從最後跳過一些字節(高達千字節左右),然後爲中央目錄定位器設置密碼,然後尋找中央目錄的開始並讀取它。 –
也許7-ZIP擴展RAM中的每個條目找到膨脹的大小。 –
儘管可能,它打開大拉鍊的速度使得這個值得懷疑 – nablex