錯誤的字符集文件名後解壓

我有以下問題：我通過提取一個SSZipArchive zip文件（在斯威夫特的應用程序），並有與「無效」字的某些文件名。
我認爲原因是我壓縮了Windows下的文件，所以名稱現在用ANSI編碼。錯誤的字符集文件名後解壓

有沒有辦法在解壓縮過程中所有的「損壞」文件夾和文件名轉換？
或更高版本？如果我必須迭代文件夾樹並重命名文件，這將是沒有問題的。
但我不知道如何找出哪些名稱是在ANSI中設置的，我也不知道如何更正字符集。

2017-02-13 altralaser

請提供樣本zip並在github跟蹤器上報告 –

可能在最新的SSZipArchive（目前2.1.1）中修復。我已經以類似於下面的代碼的方式實現了對非Unicode文件名的支持，所以如果您願意，您可以重複使用它來自己處理文件名。

OK，這是在Objective-C，但由於SSZipArchive具有自身修復已經，您應該不再需要它。否則，要麼創建一個橋接頭來包含Objective-C代碼到你的swift應用中，要麼將它轉換成Swift（應該很容易）。

@implementation NSString (SSZipArchive) 

+ (NSString *)filenameStringWithCString:(const char *)filename size:(uint16_t)size_filename 
{ 
    // unicode conversion attempt 
    NSString *strPath = @(filename); 
    if (strPath) { 
     return strPath; 
    } 

    // if filename is non-unicode, detect and transform Encoding 
    NSData *data = [NSData dataWithBytes:(const void *)filename length:sizeof(unsigned char) * size_filename]; 
    // supported encodings are in [NSString availableStringEncodings] 
    [NSString stringEncodingForData:data encodingOptions:nil convertedString:&strPath usedLossyConversion:nil]; 
    if (strPath) { 
     return strPath; 
    } 

    // if filename encoding is non-detected, we default to something based on data 
    // note: hexString is more readable than base64RFC4648 for debugging unknown encodings 
    strPath = [data hexString]; 
    return strPath; 
} 
@end 

@implementation NSData (SSZipArchive) 

// initWithBytesNoCopy from NSProgrammer, Jan 25 '12: https://stackoverflow.com/a/9009321/1033581 
// hexChars from Peter, Aug 19 '14: https://stackoverflow.com/a/25378464/1033581 
// not implemented as too lengthy: a potential mapping improvement from Moose, Nov 3 '15: https://stackoverflow.com/a/33501154/1033581 
- (NSString *)hexString 
{ 
    const char *hexChars = "ABCDEF"; 
    NSUInteger length = self.length; 
    const unsigned char *bytes = self.bytes; 
    char *chars = malloc(length * 2); 
    // TODO: check for NULL 
    char *s = chars; 
    NSUInteger i = length; 
    while (i--) { 
     *s++ = hexChars[*bytes >> 4]; 
     *s++ = hexChars[*bytes & 0xF]; 
     bytes++; 
    } 
    NSString *str = [[NSString alloc] initWithBytesNoCopy:chars 
                length:length * 2 
               encoding:NSASCIIStringEncoding 
              freeWhenDone:YES]; 
    return str; 
} 
@end

來源

2017-10-13 05:09:33

的official spec說，路徑應在代碼頁437 MS-DOS拉丁美或UTF-8（如果通用字段的第11位被置位）被任一編碼：

d 0.1 ZIP格式歷來支持只有原來的IBM PC 字符編碼集，通常被稱爲IBM代碼頁437 此限制存儲的文件名字符，只有那些值的原始MS-DOS的範圍內，並且不以其他字符編碼或語言正確支持文件。爲解決此限制，本規範將支持以下更改。

D.2如果通用11位沒有設置，文件名和註釋應符合原ZIP字符編碼。如果一般目的位11被設置爲，文件名和註釋必須支持的 Unicode標準，版本4.1.0或使用由UTF-8存儲規範中定義的字符編碼形式越大。 Unicode標準由Unicode聯盟（www.unicode.org）發佈。存儲在ZIP文件中的UTF-8編碼數據爲，預期不包含字節順序標記（BOM）。

我最近發佈了名爲ZIPFoundation ZIP文件格式的斯威夫特開源實現。它符合標準，應該能夠檢測Windows路徑名並正確解碼它們。

來源

2017-10-13 17:32:25

錯誤的字符集文件名後解壓

回答

相關問題