2015-05-15 54 views
0

我有一個程序,讀取分析一個文本文件,並對它做一些分析。我想修改它,以便它可以通過命令行獲取參數。當它被指定爲stdin時從文件讀取。如何看待stdin像一個文本文件

解析器是這樣的:

class FastAreader : 
    ''' 
    Class to provide reading of a file containing one or more FASTA 
    formatted sequences: 
    object instantiation: 
    FastAreader(<file name>): 

    object attributes: 
    fname: the initial file name 

    methods: 
    readFasta() : returns header and sequence as strings. 
    Author: David Bernick 
    Date: April 19, 2013 
    ''' 
    def __init__ (self, fname): 
     '''contructor: saves attribute fname ''' 
     self.fname = fname 

    def readFasta (self): 
     ''' 
     using filename given in init, returns each included FastA record 
     as 2 strings - header and sequence. 
     whitespace is removed, no adjustment is made to sequence contents. 
     The initial '>' is removed from the header. 
     ''' 
     header = '' 
     sequence = '' 

     with open(self.fname) as fileH: 
      # initialize return containers 
      header = '' 
      sequence = '' 

      # skip to first fasta header 
      line = fileH.readline() 
      while not line.startswith('>') : 
       line = fileH.readline() 
      header = line[1:].rstrip() 

      # header is saved, get the rest of the sequence 
      # up until the next header is found 
      # then yield the results and wait for the next call. 
      # next call will resume at the yield point 
      # which is where we have the next header 
      for line in fileH: 
       if line.startswith ('>'): 
        yield header,sequence 
        header = line[1:].rstrip() 
        sequence = '' 
       else : 
        sequence += ''.join(line.rstrip().split()).upper() 
     # final header and sequence will be seen with an end of file 
     # with clause will terminate, so we do the final yield of the data 
     yield header,sequence 

# presumed object instantiation and example usage 
# myReader = FastAreader ('testTiny.fa'); 
# for head, seq in myReader.readFasta() : 
#  print (head,seq) 

它解析看起來像這樣的文件:

>test 
ATGAAATAG 
>test2 
AATGATGTAA 
>test3 
AAATGATGTAA 

>test-1 
TTA CAT CAT 

>test-2 
TTA CAT CAT A 

>test-3 
TTA CAT CAT AA 

>test1A 
ATGATGTAAA 
>test2A 
AATGATGTAAA 
>test3A 
AAATGATGTAAA 

>test-1A 
A TTA CAT CAT 

>test-2A 
AA TTA CAT CAT A 

>test-3A 
AA TTA CAT CAT AA 

我的測試程序是這樣的:

import argparse 
import sequenceAnalysis as s 
import sys 

class Test: 
    def __init__(self, infile, longest, min, start): 
     self.longest = longest 
     self.start = set(start) 
     self.infile = infile 
     self.data = sys.stdin.read() 
     self.fasta = s.FastAreader(self.data) 
     for head, seq in self.fasta.readFasta(): 
      self.head = head 
      self.seq = "".join(seq).strip() 
     self.test() 

    def test(self): 
     print("YUP", self.start, self.head) 


def main(): 
    parser = argparse.ArgumentParser(description = 'Program prolog', 
            epilog = 'Program epilog', 
            add_help = True, #default is True 
            prefix_chars = '-', 
            usage = '%(prog)s [options] -option1[default] <input >output') 
    parser.add_argument('-i', '--inFile', action = 'store', help='input file name') 
    parser.add_argument('-o', '--outFile', action = 'store', help='output file name') 
    parser.add_argument('-lG', '--longestGene', action = 'store', nargs='?', const=True, default=True, help='longest Gene in an ORF') 
    parser.add_argument('-mG', '--minGene', type=int, choices= range(0, 2000), action = 'store', help='minimum Gene length') 
    parser.add_argument('-s', '--start', action = 'append', nargs='?', help='start Codon') #allows multiple list options 
    parser.add_argument('-v', '--version', action='version', version='%(prog)s 0.1') 
    args = parser.parse_args() 
    test = Test(args.inFile, args.longestGene, args.minGene, args.start) 


if __name__ == '__main__': 
    main() 

我的命令行輸入長相像這樣:

python testcommand2.py -s ATG <tass2.fa >out.txt 

其中tass2.fa是一個可由FastAreader解析的文件。我可以像start一樣傳遞參數,並讓它們輸出到文本文件,但是當我嘗試解析輸入文件時應該是stdin,它會打印所有內容而不是解析它,而不是輸出到指定的文本文件,該文件應該是標準輸出文件直接到命令行。

+0

您的命令行是這樣的:'python test.py < in.txt > parsed.txt' ?你可以顯示你使用的命令行(每[問])? – boardrider

+0

@boardrider對不起,我的文章已被相應編輯。但是,這是我的命令輸入的樣子。 – Sam

回答

2

當您使用I/O重定向(即您已經<|>或者在命令行<<),由外殼程序運行之前就進行處理。所以當Python運行時,它的標準輸入連接到你重定向的文件或管道,並且它的標準輸出連接到你重定向到的文件或管道,而Python的文件名不是(直接)可見的,因爲你正在處理已經open() ed文件句柄,而不是文件名。你的參數解析器不會返回任何內容,因爲沒有文件名參數。

要正確處理此問題,您應該調整代碼以直接使用文件句柄 - 而不是顯式文件名或除此之外的文件名。

對於後一種情況,常見約定是文件名爲-的特殊情況,當傳入時,使用標準輸入(或標準輸出,取決於上下文)而不是打開文件。 (你仍然可以通過使用相對路徑./-的簡單解決方法來命名這樣的文件,因此名稱並不完全是一個破折號。)

相關問題