2016-07-25 42 views
0

我需要將包含基因組中所有chr的座標的牀單格式的文件讀入根據chr的不同文件中。我嘗試了這種方法,但它不起作用,它不創建任何文件。任何想法爲什麼發生這種情況或解決這個問題的替代方法?根據原始文件的一個元素寫入幾個文件

import sys 

def make_out_file(dir_path, chr_name, extension): 

    file_name = dir_path + "/" + chr_name + extension 
    out_file = open(file_name, "w") 
    out_file.close() 
    return file_name 

def append_output_file(line, out_file): 

    with open(out_file, "a") as f: 
     f.write(line) 
    f.close() 

in_name = sys.argv[1] 
dir_path = sys.argv[2] 

with open(in_name, "r") as in_file: 

    file_content = in_file.readlines() 
    chr_dict = {} 
    out_file_dict = {} 
    line_count = 0 
    for line in file_content[:0]: 
     line_count += 1 
     elems = line.split("\t") 
     chr_name = elems[0] 
     chr_dict[chr_name] += 1 
     if chr_dict.get(chr_name) = 1: 
      out_file = make_out_file(dir_path, chr_name, ".bed") 
      out_file_dict[chr_name] = out_file 
      append_output_file(line, out_file) 
     elif chr_dict.get(chr_name) > 1: 
      out_file = out_file_dict.get(chr_name) 
      append_output_file(line, out_file) 
     else: 
      print "There's been an Error" 


in_file.close() 

回答

1

這條線:

for line in file_content[:0]: 

說來遍歷一個空列表。空列表來自片段[:0],該片段表示從列表的開頭切換到第一個元素之前。這裏有一個演示:

>>> l = ['line 1\n', 'line 2\n', 'line 3\n'] 
>>> l[:0] 
[] 
>>> l[:1] 
['line 1\n'] 

因爲列表是空的沒有迭代發生,所以你的for循環體中的代碼沒有執行。

遍歷文件的每一行,你不需要切片:

for line in file_content: 

但是,再好一些遍歷文件對象,因爲這並不需要將整個文件先讀進入內存:

with open(in_name, "r") as in_file:  
    chr_dict = {} 
    out_file_dict = {} 
    line_count = 0 
    for line in in_file: 
     ... 

之後有很多問題,包括語法錯誤,for循環中的代碼可以開始調試。

相關問題