2016-08-22 58 views
0

我想在Python腳本中從HDFS導入tar.gz文件,然後解壓縮它。該文件如下20160822073413-EoRcGvXMDIB5SVenEyD4pOEADPVPhPsg.tar.gz,它始終具有相同的結構。在通配符的Python腳本中解壓縮文件

在我的python腳本中,我想複製它在本地和提取文件。我使用下面的命令來做到這一點:

import subprocess 
import os 
import datetime 
import time 

today = time.strftime("%Y%m%d") 

#Copy tar file from HDFS to local server 
args = ["hadoop","fs","-copyToLocal", "/locationfile/" + today + "*"] 

p=subprocess.Popen(args) 

p.wait() 

#Untar the CSV file 
args = ["tar","-xzvf",today + "*"] 

p=subprocess.Popen(args) 

p.wait() 

進口完美的作品,但我不能提取文件,我收到以下錯誤:

['tar', '-xzvf', '20160822*.tar'] 
tar (child): 20160822*.tar: Cannot open: No such file or directory 
tar (child): Error is not recoverable: exiting now 
tar: Child returned status 2 
tar: Error is not recoverable: exiting now 
put: `reportResults.csv': No such file or directory 

誰能幫助我?

非常感謝!

回答

0

我找到了一種方法來做我所需要的,而不是使用os命令,我使用python tar命令並且它工作正常!

import tarfile 
import glob 

os.chdir("/folder_to_scan/") 
for file in glob.glob("*.tar.gz"): 
    print(file) 

tar = tarfile.open(file) 
tar.extractall() 

希望得到這個幫助。

Registers Majid

3

嘗試用shell選項:

p=subprocess.Popen(args, shell=True) 

the docs

If shell is True, the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want convenient access to other shell features such as shell pipes, filename wildcards, environment variable expansion, and expansion of ~ to a user’s home directory.

及通知:

However, note that Python itself offers implementations of many shell-like features (in particular, glob, fnmatch, os.walk(), os.path.expandvars(), os.path.expanduser(), and shutil).

+0

嗨,謝謝。我現在有一個不同的錯誤: tar:您必須指定'-Acdtrux'或'--test-label'選項之一 有關更多信息,請嘗試'tar --help'或'tar --usage'。 \t 謝謝 – Majid

+0

@Majid將'today'變量傳遞給'Popen'時有什麼變化? – martriay

+0

這是當天的日期格式爲20160822.我這樣做是因爲我每天接收一個文件,並嘗試自動執行此過程 – Majid

2

除了@martriay答案,你也得到了一個錯字 - 你寫了「20160822 * .tar」,而你的文件的模式是「20160822 * .tar.gz」

當應用shell=True,命令應作爲一個整體的字符串(見documentation),像這樣通過:

p=subprocess.Popen('tar -xzvf 20160822*.tar.gz', shell=True) 

如果您不需要p,你可以簡單地使用subprocess.call

subprocess.call('tar -xzvf 20160822*.tar.gz', shell=True) 

我建議你使用更多的標準庫,就像這樣:

import glob 
import tarfile 

today = "20160822" # compute your common prefix here 
target_dir = "/tmp" # choose where ever you want to extract the content 

for targz_file in glob.glob('%s*.tar.gz' % today): 
    with tarfile.open(targz_file, 'r:gz') as opened_targz_file: 
     opened_targz_file.extractall(target_dir) 
+0

是的,這是一個錯字,我嘗試解壓縮然後unrar,但同樣的第一個問題。 – Majid