我正在使用hadoop流式運行python子進程運行C++可執行文件(一種稱爲blast的生物信息學軟件)。在命令行上執行時,Blast會輸出一個結果文件。但是在hadoop上運行時,我找不到blast的輸出文件。我想知道,輸出文件在哪裏?hadoop streaming中python子進程的輸出文件在哪裏
我的代碼(map.py)是如下:
# path used on hadoop
tool = './blastx'
reference_path = 'Reference.fa'
# input format example
# >LW1 (contig name)
# ATCGATCGATCG (sequence)
# samile file: https://goo.gl/XTauAx
(name, seq) = (None, None)
for line in sys.stdin:
# when detact the ">" sign, assign contig name
if line[0] == '>':
name = line.strip()[1:]
# otherwise, assign the sequence
else:
seq = line.strip()
if name and seq:
# assign the path of output file
output_file = join(current_path, 'tmp_output', name)
# blast command example (export out file to a given path)
command = 'echo -e \">%s\\n%s\" | %s -db %s -out %s -evalue 1e-10 -num_threads 16' % (name, seq, tool, reference_path, output_file)
# execute command with python subprocess
cmd = Popen(command, stdin=PIPE, stdout=PIPE, shell=True)
# retrieve the standard output of command
cmd_out, cmd_err = cmd.communicate()
print '%s\t%s' % (name, output_file)
的命令來調用鼓風是:
command = 'echo -e \">%s\\n%s\" | %s -db %s -out %s -evalue 1e-10 -num_threads 16' % (name, seq, tool, reference_path, output_file)
通常情況下,輸出文件是在output_file
的路徑,但我可以沒有在本地文件系統和hdfs上找到它們。看起來它們是在臨時目錄中創建的,並在執行後消失。我如何檢索它們?