迭代目錄到用python壓縮文件

我需要遍歷一個文件夾，並找到每個文件名相同（擴展名除外）的實例，然後將每個文件壓縮（最好使用tarfile）到一個文件中。迭代目錄到用python壓縮文件

所以我有5個文件命名爲：「example1」，每個文件具有不同的文件擴展名。我需要一起壓縮它們並將它們輸出爲「example1.tar」或類似的東西。

這將是一個簡單的for循環如很容易：

焦油= tarfile.open（ 'example1.tar'，「W」）

在水珠輸出（ '例1 *'）：

tar.add（輸出）

tar.clo SE（）

然而，有300個「榜樣」的文件，我需要通過每一個及其相關的5個文件，以使這項工作進行迭代。這是我的頭。任何建議不勝感激。

2011-05-06 KennyC

您描述的模式概括爲MapReduce。我發現的MapReduce a simple implementation網上，從偶數簡單的版本是：

def map_reduce(data, mapper, reducer): 
    d = {} 
    for elem in data: 
     key, value = mapper(elem) 
     d.setdefault(key, []).append(value) 
    for key, grp in d.items(): 
     d[key] = reducer(key, grp) 
    return d

你想按自己的名稱的所有文件不帶擴展名，你可以從os.path.splitext(fname)[0]得到。然後，你想通過使用tarfile模塊來製作一個tarball。在代碼中，那就是：

import os 
import tarfile 

def make_tar(basename, files): 
    tar = tarfile.open(basename + '.tar', 'w') 
    for f in files: 
     tar.add(f) 
    tar.close() 

map_reduce(os.listdir('.'), 
      lambda x: (os.path.splitext(x)[0], x), 
      make_tar)

編輯：如果你想以不同的方式組的文件，你只需要修改的第二個參數map_reduce。上面的代碼對於表達式os.path.splitext(x)[0]具有相同值的文件。因此，通過與基本文件名組中的所有擴展剝下，你可以替換表達strip_all_ext(x)並添加：

def strip_all_ext(path): 
    head, tail = os.path.split(path) 
    basename = tail.split(os.extsep)[0] 
    return os.path.join(head, basename)

來源

2011-05-06 20:04:25 Karmastan

無論如何改變這個代碼或使用os.path.extsep爲了從一個文件拆分多個擴展。例如'foobar.aux.xml' – KennyC 2011-05-08 22:22:25

@KennyC：更新回答 – Karmastan 2011-05-08 22:56:50

@Karamastan：完美！連續的工作。謝謝 – KennyC 2011-05-09 14:43:22

嘗試使用glob模塊：http://docs.python.org/library/glob.html

來源

2011-05-06 19:28:46 zeekay

你得問題。分開解決。

查找匹配的名稱。使用collections.defaultict
找到匹配的名稱後創建tar文件。你有很好的覆蓋。

所以，首先解決問題1。使用glob獲得所有的名字。使用os.path.basename拆分路徑和基本名稱。使用os.path.splitext來拆分名稱和擴展名。

可以使用列表字典來保存所有具有相同名稱的文件。

這就是你在做的第1部分？

第2部分將文件放入tar檔案中。爲此，您已獲得大部分所需的代碼。

來源

2011-05-06 19:29:55

你可以這樣做：

列入目錄
創建一個字典，其中的基本部分是關鍵，所有分機都值
然後焦油字典鍵中的所有文件中的所有文件

事情是這樣的：

import os 
import tarfile 
from collections import defaultdict 

myfiles = os.listdir(".") # List of all files 
totar = defaultdict(list) 

# now fill the defaultdict with entries; basename as keys, extensions as values 
for name in myfiles: 
    base, ext = os.path.splitext(name) 
    totar[base].append(ext) 

# iterate through all the basenames 
for base in totar: 
    files = [base+ext for ext in totar[base]] 
    # now tar all the files in the list "files" 
    tar = tarfile.open(base+".tar", "w") 
    for item in files:  
     tar.add(item) 
    tar.close()

來源

2011-05-06 19:36:09

#! /usr/bin/env python 

import os 
import tarfile 

tarfiles = {} 
for f in os.listdir ('files'): 
    prefix = f [:f.rfind ('.') ] 
    if prefix in tarfiles: tarfiles [prefix] += [f] 
    else: tarfiles [prefix] = [f] 

for k, v in tarfiles.items(): 
    tf = tarfile.open ('%s.tar.gz' % k, 'w:gz') 
    for f in v: tf.addfile (tarfile.TarInfo (f), file ('files/%s' % f)) 
    tf.close()

來源

2011-05-06 19:37:32 Hyperboreus

下面是完整的腳本： – Hyperboreus 2011-05-06 19:46:34

@Hyperboreus：-1 ...'˚F ='fubar';前綴= f [：f.rfind（'。'）]'產生''fuba'' ...使用'os.path.splitext（）' – 2011-05-06 21:52:01

@Hyboreus：當你在它的時候， ['在切片和字典訪問和'（'在函數調用 – 2011-05-06 21:57:02

-1

import os 
import tarfile 

allfiles = {} 

for filename in os.listdir("."): 
    basename = '.'.join (filename.split(".")[:-1]) 
    if not basename in all_files: 
     allfiles[basename] = [filename] 
    else: 
     allfiles[basename].append(filename) 

for basename, filenames in allfiles.items(): 
    if len(filenames) < 2: 
     continue 
    tardata = tarfile.open(basename+".tar", "w") 
    for filename in filenames: 
     tardata.add(filename) 
    tardata.close()

來源

2011-05-06 19:42:13 jsbueno

-1'使用os.path.splitext （）' - ''。'。join（'fubar'.split（「。」）[： - 1]）'產生一個空字符串。 – 2011-05-07 07:24:48

迭代目錄到用python壓縮文件

回答

相關問題