hadoop streaming中python子進程的輸出文件在哪裏

我正在使用hadoop流式運行python子進程運行C++可執行文件（一種稱爲blast的生物信息學軟件）。在命令行上執行時，Blast會輸出一個結果文件。但是在hadoop上運行時，我找不到blast的輸出文件。我想知道，輸出文件在哪裏？hadoop streaming中python子進程的輸出文件在哪裏

我的代碼（map.py）是如下：

# path used on hadoop 
tool = './blastx' 
reference_path = 'Reference.fa' 

# input format example 

# >LW1   (contig name) 
# ATCGATCGATCG (sequence) 

# samile file: https://goo.gl/XTauAx 

(name, seq) = (None, None) 

for line in sys.stdin: 

    # when detact the ">" sign, assign contig name 
    if line[0] == '>': 
     name = line.strip()[1:] 

    # otherwise, assign the sequence 
    else: 
     seq = line.strip() 

     if name and seq: 

      # assign the path of output file 
      output_file = join(current_path, 'tmp_output', name) 

      # blast command example (export out file to a given path) 
      command = 'echo -e \">%s\\n%s\" | %s -db %s -out %s -evalue 1e-10 -num_threads 16' % (name, seq, tool, reference_path, output_file) 

      # execute command with python subprocess 
      cmd = Popen(command, stdin=PIPE, stdout=PIPE, shell=True) 

      # retrieve the standard output of command 
      cmd_out, cmd_err = cmd.communicate() 

      print '%s\t%s' % (name, output_file)

的命令來調用鼓風是：

command = 'echo -e \">%s\\n%s\" | %s -db %s -out %s -evalue 1e-10 -num_threads 16' % (name, seq, tool, reference_path, output_file)

通常情況下，輸出文件是在output_file的路徑，但我可以沒有在本地文件系統和hdfs上找到它們。看起來它們是在臨時目錄中創建的，並在執行後消失。我如何檢索它們？

來源

2016-03-02 user2583253

我找到了blast的輸出文件。看起來，他們留在爆炸執行的節點。所以在我把它們放回hdfs後，我可以在目錄/user/yarn下訪問它們。我所做的是下面的代碼添加到map.py：

command = 'hadoop fs -put %s' % output_file 
cmd = Popen(command, stdin=PIPE, stdout=PIPE, shell=True)

而且我也使用

output_file = join(current_path, 'tmp_output', name)

[更新修改的輸出路徑

output_file = name

，而不是在3/3 ] 但是將文件放在用戶的紗線目錄下並不好，因爲普通用戶沒有權限編輯目錄下的文件。我建議把文件放入/tmp/blast_tmp通過改變命令

command = 'hadoop fs -put %s /tmp/blast_tmp' % output_file

在此之前，該目錄/tmp/blast_tmp應

% hadoop fs -mkdir /tmp/blast_tmp

創建和

% hadoop fs -chmod 777 /tmp/blast_tmp

在改變目錄的權限這種情況下，用戶紗線和你都可以訪問目錄。

來源

2016-03-02 09:48:12 user2583253

hadoop streaming中python子進程的輸出文件在哪裏

回答

相關問題