2016-04-07 108 views
2

我目前正在爲容器格式編寫一個開放源代碼庫,其中涉及修改zip檔案。所以我利用pythons內置的zipfile模塊。由於一些限制,我決定修改模塊並將其與我的庫一起發貨。這些修改包括一個修補程序,用於從python問題跟蹤器中刪除zip文件中的條目:https://bugs.python.org/issue6818 更具體地說,我包含來自ubershmekel的zipfile.remove.2.patch。 在對Python-2.7進行了一些修改之後,根據交付的單元測試,補丁工作得很好。Python從zip文件中刪除條目

但是,我遇到了一些問題,當刪除,添加和刪除+添加文件而不關閉之間的zip文件。

Error 
Traceback (most recent call last): 
    File "/home/martin/git/pyCombineArchive/tests/test_zipfile.py", line 1590, in test_delete_add_no_close 
    self.assertEqual(zf.read(fname), data) 
    File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 948, in read 
    with self.open(name, "r", pwd) as fp: 
    File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 1003, in open 
    % (zinfo.orig_filename, fname)) 
BadZipFile: File name in directory 'foo.txt' and header 'bar.txt' differ. 

意思是壓縮文件是好的,但不知何故中央詞典/條目標題會混淆。 這個單元測試重現此錯誤:

def test_delete_add_no_close(self): 
    fname_list = ["foo.txt", "bar.txt", "blu.bla", "sup.bro", "rollah"] 
    data_list = [''.join([chr(randint(0, 255)) for i in range(100)]) for i in range(len(fname_list))] 

    # add some files to the zip 
    with zipfile.ZipFile(TESTFN, "w") as zf: 
     for fname, data in zip(fname_list, data_list): 
      zf.writestr(fname, data) 

    for no in range(0, 2): 
     with zipfile.ZipFile(TESTFN, "a") as zf: 
      zf.remove(fname_list[no]) 
      zf.writestr(fname_list[no], data_list[no]) 
      zf.remove(fname_list[no+1]) 
      zf.writestr(fname_list[no+1], data_list[no+1]) 

      # try to access prior deleted/added file and prior last file (which got moved, while delete) 
      for fname, data in zip(fname_list, data_list): 
       self.assertEqual(zf.read(fname), data) 

我修改壓縮文件模塊和完整單元測試文件可以在這個要點中找到:https://gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4

回答

0

一些密集的調試之後,我十分肯定出事了與移動剩下的塊。 (在刪除的文件之後存儲的文件)因此,我繼續並重寫了此代碼部分,以便每次都複製這些文件/塊。此外,我重寫每個文件頭(以確保它是有效的)和zipfile文件末尾的中央目錄。 我刪除函數現在看起來是這樣的:

def remove(self, member): 
    """Remove a file from the archive. Only works if the ZipFile was opened 
    with mode 'a'.""" 

    if "a" not in self.mode: 
     raise RuntimeError('remove() requires mode "a"') 
    if not self.fp: 
     raise RuntimeError(
       "Attempt to modify ZIP archive that was already closed") 
    fp = self.fp 

    # Make sure we have an info object 
    if isinstance(member, ZipInfo): 
     # 'member' is already an info object 
     zinfo = member 
    else: 
     # Get info object for member 
     zinfo = self.getinfo(member) 

    # start at the pos of the first member (smallest offset) 
    position = min([info.header_offset for info in self.filelist]) # start at the beginning of first file 
    for info in self.filelist: 
     fileheader = info.FileHeader() 
     # is member after delete one? 
     if info.header_offset > zinfo.header_offset and info != zinfo: 
      # rewrite FileHeader and copy compressed data 
      # Skip the file header: 
      fp.seek(info.header_offset) 
      fheader = fp.read(sizeFileHeader) 
      if fheader[0:4] != stringFileHeader: 
       raise BadZipFile("Bad magic number for file header") 

      fheader = struct.unpack(structFileHeader, fheader) 
      fname = fp.read(fheader[_FH_FILENAME_LENGTH]) 
      if fheader[_FH_EXTRA_FIELD_LENGTH]: 
       fp.read(fheader[_FH_EXTRA_FIELD_LENGTH]) 

      if zinfo.flag_bits & 0x800: 
       # UTF-8 filename 
       fname_str = fname.decode("utf-8") 
      else: 
       fname_str = fname.decode("cp437") 

      if fname_str != info.orig_filename: 
       if not self._filePassed: 
        fp.close() 
       raise BadZipFile(
         'File name in directory %r and header %r differ.' 
         % (zinfo.orig_filename, fname)) 

      # read the actual data 
      data = fp.read(fheader[_FH_COMPRESSED_SIZE]) 

      # modify info obj 
      info.header_offset = position 
      # jump to new position 
      fp.seek(info.header_offset, 0) 
      # write fileheader and data 
      fp.write(fileheader) 
      fp.write(data) 
      if zinfo.flag_bits & _FHF_HAS_DATA_DESCRIPTOR: 
       # Write CRC and file sizes after the file data 
       fp.write(struct.pack("<LLL", info.CRC, info.compress_size, 
         info.file_size)) 
      # update position 
      fp.flush() 
      position = fp.tell() 

     elif info != zinfo: 
      # move to next position 
      position = position + info.compress_size + len(fileheader) + self._get_data_descriptor_size(info) 

    # Fix class members with state 
    self.start_dir = position 
    self._didModify = True 
    self.filelist.remove(zinfo) 
    del self.NameToInfo[zinfo.filename] 

    # write new central directory (includes truncate) 
    fp.seek(position, 0) 
    self._write_central_dir() 
    fp.seek(self.start_dir, 0) # jump to the beginning of the central directory, so it gets overridden at close() 

您可以找到依據最新修訂的完整代碼:https://gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4

或在圖書館的回購我寫:https://github.com/FreakyBytes/pyCombineArchive