找到tarball裏面最大的文件

我有一個有13000個文件的gzip壓縮包。我如何從Python程序中只提取其中最大的文件？找到tarball裏面最大的文件

我已經試過通過tarball閱讀並檢查每個文件的提取長度，但這需要太長的時間。有沒有更好的方法來做到這一點？

原代碼（增加了對這個問題的完整起見，即使是選擇一個答案）：

from tarfile import TarFile 
archive = TarFile(filename) 
members = archive.getmembers() 
sizes = [] 
for member in members: 
    sizes.append(member.size) 
largest = max(sizes) 
largest_info = sizes.index(largest) 
print(largest_info.name)

來源

2013-12-07 Alphadelta14

您如何期望在不查看tarball中的所有文件的情況下找到最大的文件？ –

你有沒有在the documentation看？

import tarfile 
archive = tarfile.TarFile('/path/to/my/tarfile.tar') 
max_size = 0 
max_name = None 
for file in archive.getmembers(): 
    if file.size > max_size: 
     max_size = file.size 
     max_name = file.name 

print(max_size) 
print(max_name)

來源

2013-12-07 21:46:17

內置'max'函數不會更好嗎？ 'max（archive.getmembers（），key = operator.itemgetter（'size'））' – mgilson

我得到'TypeError：'TarInfo'對象不是可執行的。 –

'max（archive.getmembers（），key = operator.attrgetter（'size'））'似乎工作正常。 – Alphadelta14

答案是你必須查看所有檔案找出最大的成員。這是因爲TAR文件的目的是爲歸檔類型，因此沒有目錄（TOC）：

The possible reason for not using a centralized location of information is that tar was originally meant for tapes, which are bad at random access anyway: if the Table Of Contents (TOC) were at the start of the archive, creating it would mean to first calculate all the positions of all files, which needs doubled work, a big cache, or rewinding the tape after writing everything to write the TOC

緬維瑟與工作代碼提供你。

來源

2013-12-07 21:52:49 alko

找到tarball裏面最大的文件

回答

相關問題