使用Python提取tar文件的最快方法

-1

我必須提取數百個大小爲5GB的tar.bz文件。因此，嘗試下面的代碼：使用Python提取tar文件的最快方法

import tarfile 
from multiprocessing import Pool 

files = glob.glob('D:\\*.tar.bz') ##All my files are in D 
for f in files: 

    tar = tarfile.open (f, 'r:bz2') 
    pool = Pool(processes=5) 

    pool.map(tar.extractall('E:\\') ###I want to extract them in E 
    tar.close()

但代碼有錯誤類型：類型錯誤：地圖（）至少需要3個參數（2給出）

我該如何解決呢？任何進一步的想法，以加速提取？

來源

2014-09-21 Beau

我打賭你的問題在這裏是I/O而不是代碼。 'map'錯誤很明顯：你必須提供一個函數和該函數的參數列表。你的情況：'map（extractall，[list，of，files]）' – xbello 2014-09-21 15:12:41

如何提供目標目錄？地圖（extractall，[list，of，files]） – Beau 2014-09-21 15:16:45

每個文件有不同的目標？ '[（list，dest），（of，dest2），（files，dest3）]'。相同的目標？爲'extractall'創建一個'functools.partial'。 – xbello 2014-09-21 15:18:30

您需要更改pool.map(tar.extractall('E:\\')喜歡的東西pool.map(tar.extractall(),"list_of_all_files")

注意map()需要2個參數第一個是功能，第二個是一個迭代，並應用功能的可迭代每個項目並返回結果的列表。

編輯：你需要一個TarInfo對象傳遞到另一個進程：

def test_multiproc(): 
    files = glob.glob('D:\\*.tar.bz2') 
    pool = Pool(processes=5) 
    result = pool.map(read_files, files) 


def read_files(name): 

t = tarfile.open (name, 'r:bz2') 
t.extractall('E:\\') 
t.close() 

>>>test_multiproc()

來源

2014-09-21 15:16:41 Kasramvd

E是提取文件的目標目錄。 – Beau 2014-09-21 15:17:32

所以沒有必要使用tar = tarfile.open（f，'r：bz2'）？ – Beau 2014-09-21 15:22:26

是的，我認爲你可以在'tar.extractall'中使用'TarFile.getmembers（）'' – Kasramvd 2014-09-21 15:27:29

定義一個提取單個tar文件的功能。通過該函數和一個tar文件列表到multiprocessing.Pool.map：

from functools import partial 
import glob 
from multiprocessing import Pool 
import tarfile 


def extract(path, dest): 
    with tarfile.open(path, 'r:bz2') as tar: 
     tar.extractall(dest) 

if __name__ == '__main__': 
    files = glob.glob('D:\\*.tar.bz') 
    pool = Pool(processes=5) 
    pool.map(partial(extract, dest='E:\\'), files)

來源

2014-09-21 17:19:35 falsetru

此外，你可以看看concurrent.futures.ProcessPoolExecutor（） https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor – 2014-09-21 18:01:25

使用Python提取tar文件的最快方法

回答

相關問題