2015-09-24 26 views
1

我通過目錄試圖循環,檢查每個文件的大小,並將文件添加到列表中,直到他們達到一個一定的大小(2040 MB)。此時,我想將列表放入一個zip存檔中,然後繼續循環查看目錄中的下一組文件,並繼續執行同樣的操作。另一個約束是,具有相同名稱但擴展名不同的文件需要一起添加到zip中,並且不能分開。我希望這是有道理的。的Python:連續檢查被添加到列表文件的大小,停在大小,拉鍊列表,繼續

我遇到的問題是,我的代碼基本上忽略了我添加了大小限制,只是拉鍊了目錄中的所有文件反正。

我懷疑有一些邏輯問題,但我沒有看到它。任何幫助,將不勝感激。這裏是我的代碼:

import os,os.path, zipfile 
from time import * 

#### Function to create zip file #### 
# Add the files from the list to the zip archive 
def zipFunction(zipList): 

    # Specify zip archive output location and file name 
    zipName = "D:\Documents\ziptest1.zip" 

    # Create the zip file object 
    zipA = zipfile.ZipFile(zipName, "w", allowZip64=True) 

    # Go through the list and add files to the zip archive 
    for w in zipList: 

     # Create the arcname parameter for the .write method. Otherwise the zip file 
     # mirrors the directory structure within the zip archive (annoying). 
     arcname = w[len(root)+1:] 

     # Write the files to a zip 
     zipA.write(w, arcname, zipfile.ZIP_DEFLATED) 

    # Close the zip process 
    zipA.close() 
    return  
################################################# 
################################################# 

sTime = clock() 

# Set the size counter 
totalSize = 0 

# Create an empty list for adding files to count MB and make zip file 
zipList = [] 

tifList = [] 

xmlList = [] 

# Specify the directory to look at 
searchDirectory = "Y:\test" 

# Create a counter to check number of files 
count = 0 

# Set the root, directory, and file name 
for root,direc,f in os.walk(searchDirectory): 

     #Go through the files in directory 
     for name in f: 
      # Set the os.path file root and name 
      full = os.path.join(root,name) 

      # Split the file name from the file extension 
      n, ext = os.path.splitext(name) 

      # Get size of each file in directory, size is obtained in BYTES 
      fileSize = os.path.getsize(full) 

      # Add up the total sizes for all the files in the directory 
      totalSize += fileSize 

      # Convert from bytes to megabytes 
       # 1 kilobyte = 1,024 bytes 
       # 1 megabyte = 1,048,576 bytes 
       # 1 gigabyte = 1,073,741,824 bytes 
      megabytes = float(totalSize)/float(1048576) 

      if ext == ".tif": # should be everything that is not equal to XML (could be TIF, PDF, etc.) need to fix this later 
       tifList.append(n)#, fileSize/1048576]) 
       tifSorted = sorted(tifList) 
      elif ext == ".xml": 
       xmlList.append(n)#, fileSize/1048576]) 
       xmlSorted = sorted(xmlList) 

      if full.endswith(".xml") or full.endswith(".tif"): 
       zipList.append(full) 

      count +=1 

      if megabytes == 2040 and len(tifList) == len(xmlList): 
       zipFunction(zipList) 
      else: 
       continue 

eTime = clock() 
elapsedTime = eTime - sTime 
print "Run time is %s seconds"%(elapsedTime) 

我能想到的唯一的事情是,從未有這樣的情況:我的變量megabytes==2040完全吻合。我不知道如何使代碼停止在那個點上,否則,我想知道使用範圍會起作用嗎?我也嘗試過:

if megabytes < 2040: 
     zipList.append(full) 
     continue 
    elif megabytes == 2040: 
     zipFunction(zipList) 

回答

1

您的主要問題是,您需要在歸檔當前文件列表時重置文件大小。例如

if megabytes >= 2040: 
    zipFunction(zipList) 
    totalSize = 0 

BTW,你不需要

else: 
    continue 

那裏,因爲它是循環的結束。

至於你需要保持文件一起具有相同的主文件名,但擴展名不同的約束,只有很簡單的方法做到這一點是之前處理它們的文件名進行排序。

如果要保證在每個存檔文件總大小是你需要你的文件(S)添加到列表之前測試大小的限制下。例如,

if (totalSize + fileSize) // 1048576 > 2040: 
    zipFunction(zipList) 
    totalsize = 0 

totalSize += fileSize 

這種邏輯將需要略作修改,以處理保持一組文件一起的:你需要在組中加在一起的每個文件的filesizes成小計,然後看如果將該小計添加到totalSize會超出限制。