2012-08-02 15 views
-4

我有代碼,讓我(至少對我來說)執行一個相當複雜的任務:的Python OUTFILE不寫的一切

import csv 
import os.path 
#open files + readlines 
with open("C:/Users/Ivan Wong/Desktop/Placement/Lists of targets/Mouse/UCSC to Ensembl.csv", "r") as f: 
    reader = csv.reader(f, delimiter = ',') 
    #find files with the name in 1st row 
    for row in reader: 
     graph_filename = os.path.join("C:/Python27/Scripts/My scripts/Top targets",row[0]+"_nt_counts.txt.png") 
     if os.path.exists(graph_filename): 
      y = row[0]+'_nt_counts.txt' 
      r = open('C:/Users/Ivan Wong/Desktop/Placement/fp_mesc_nochx/'+y, 'r') 
      k = r.readlines() 
      r.close 
      del k[:1] 
      k = map(lambda s: s.strip(), k) 
      interger = map(int, k) 
      import itertools 
      #adding the numbers for every 3 rows 
      def grouper(n, iterable, fillvalue=None): 
       "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" 
       args = [iter(iterable)] * n 
       return itertools.izip_longest(*args, fillvalue=fillvalue) 
      result = map(sum, grouper(3, interger, 0))  
      e = row[0] 
      print e 
      cDNA = open('C:/Users/Ivan Wong/Desktop/Placement/Downloaded seq/Mouse/MOUSE_mRNAs.txt', 'r') 
      seq = cDNA.readlines() 
      # get all lines that have a gene name 
      lineNum = 0; 
      lineGenes = [] 
      for line in seq: 
       lineNum = lineNum +1 
       if '>' in line: 
        lineGenes.append(str(lineNum)) 
       if '>'+e in line: 
        lineBegin = lineNum 

      cDNA.close 

      # which gene is this 
      index1 = lineGenes.index(str(lineBegin)) 
      lineEnd = lineGenes[index1+1]   
# linebegin and lineEnd now give you, where to look for your sequence, all that 
# you have to do is to read the lines between lineBegin and lineEnd in the file 
# and make it into a single string.    
      lineEnd = lineGenes[index1+1] 
      Lastline = int(lineEnd) -1 

# in your code you have already made a list with all the lines (q), first delete 
# \n and other symbols, then combine all lines into a big string of nucleotides (like this)  
      qq = seq[lineBegin:Lastline] 
      qq = map(lambda s: s.strip(), qq) 
      string = '' 
      for i in range(len(qq)): 
       string = string + qq[i] 
# now you want to get a list of triplets, again you can use the for loop: 
# first get the length of the string 
      lenString = len(string); 
# this is your list codons 
      listCodon = [] 
      for i in range(0,lenString/3): 
       listCodon.append(string[0+i*3:3+i*3]) 
      proper_result = '\n'.join('%s, %s' % (nr, codon) for nr, codon in zip(result, listCodon)) 
      with open(e+'.csv','wb') as outfile: 
       outfile.writelines(proper_result) 

這些代碼讀取從.csv文件,從文件的文件夾識別具有相同的名稱,如果存在,那麼它繼續處理一些數據,並將其寫入到.csv 和他們在一起,我outfiles現在看起來是這樣outfile

它看起來完全沒問題,但有一個問題,我知道從我的數據(我以不同的方式檢查過)第二列應該比我得到的長。我認爲這是因爲代碼正在寫入文件時,兩個結果(數字)和listCodon(字母)都存在,因此我失去了一些東西。我該如何解決它?

我試圖打印listCodon文件寫入之前,發現了所有的三胞胎仍然存在,所以我猜問題是內這裏:

proper_result = '\n'.join('%s, %s' % (nr, codon) for nr, codon in zip(result, listCodon)) 
+0

你甚至在你的代碼中有一個關於獲取三元組列表的評論,你驚訝於你輸出的是三元組? :/ – geoffspear 2012-08-02 13:35:58

+0

有一件事:使用'r.close()',而不是'r.close'。 – Matthias 2012-08-02 13:36:16

+0

按「更長」的意思是「更多行」,是嗎? – inVader 2012-08-02 13:37:41

回答

3

zip將停止儘快任何其iterables停止(因爲否則將不知道該怎麼填補空白用!):

返回的列表的長度被截斷,以最短的說法序列的長度。

如果要將較短的迭代次數填充到最長的長度,請使用izip_longest(它將可選參數用作填充值)。

+0

明白了,謝謝 – ivanhoifung 2012-08-02 14:11:54