在csv文件中基於切片索引連接切片的字符串

嗯，我的挑戰看起來很簡單，但我用完了選項。所以任何幫助將不勝感激。在csv文件中基於切片索引連接切片的字符串

我有很多fasta格式的DNA序列，它們需要在特定位置切片，然後連接所得到的部分。所以，如果我的序列文件是像這樣：

~$ cat seq_file 
>Sequence1 
This is now a sequence that must require a bit of slicing and concatenation to be useful 
>Sequence2 
I have many more uncleaned strings like this in the form of sequences

我所要的輸出是這樣：

>Sequence1 
This is useful 
>Sequence2 
I have cleaned sequences

現在片部分是由從單獨的csv文件分片索引確定。在這種情況下，切片位置被組織成這樣：

~$ cat test.csv 
Sequence1,0,9,66,74,, 
Sequence2,0,5,15,22,48,57

我的代碼：

from Bio import SeqIO 
import csv 

seq_dict = {} 
for seq_record in SeqIO.parse('seq_file', 'fasta'): 
    descr = seq_record.description 
    seq_dict[descr] = seq_record.seq 

with open('test.csv', 'rb') as file: 
    reader = csv.reader(file) 
    for row in reader: 
     seq_id = row[0] 
     for n in range(1,7): 
      if n % 2 != 0: 
       start = row[n] # all start positions for the slice occupy non-even rows 
      else: 
       end = row[n] 

       for key, value in sorted(seq_dict.iteritems()): 
        #print key, value 
        if key == string_id: # cross check matching sequence identities 
         try: 
          slice_seq = value[int(start):int(end)] 
          print key 
          print slice_seq 
         except ValueError: 
          print 'Ignore empty slice indices.. '

現在，這將打印：

Sequence1 
Thisisnow 
Sequence1 
useful 
Ignore empty slice indices.. 
Sequence2 
Ihave 
Sequence2 
cleaned 
Sequence2 
sequences

到目前爲止好，這是我所期待的。但是，如何通過連接或連接或通過python中的任何可能操作將切片部分連接到一起以達到我想要的目的？謝謝。

來源

2014-02-16 user3014974

你可以做到這一點與一對夫婦的修改：

with open('test.csv', 'rb') as file: 
    reader = csv.reader(file) 
    for row in reader: 
     seq_id = row[0] 
     seqs = [] 
     for n in range(1,7): 
      if n % 2 != 0: 
       start = row[n] # all start positions for the slice occupy non-even rows 
      else: 
       end = row[n] 

       for key, value in sorted(seq_dict.iteritems()): 
        #print key, value 
        if key == seq_id: # cross check matching sequence identities 
         try: 
          seqs.append(value[int(start):int(end)]) 
         except ValueError: 
          print 'Ignore empty slice indices.. ' 
     print ' '.join(str(x) for x in seqs)

來源

2014-02-16 15:55:37

我用'STR（X）'從'Seq'對象中創建字符串，因爲'join'不能與它們一起工作。 –

非常感謝。非常！！ – user3014974

事情是這樣的：

import csv 
from string import whitespace 
with open('seq_file') as f1, open('test.csv') as f2: 
    for row in csv.reader(f2): 
     it = iter(map(int, filter(None, row[1:]))) 
     slices = [slice(*(x,next(it))) for x in it] 
     seq = next(f1) 
     line = next(f1).translate(None, whitespace) 
     print seq, 
     print ' '.join(line[s] for s in slices)

輸出：

>Sequence1 
Thisisnow useful 
>Sequence2 
Ihave cleaned sequences

來源

2014-02-16 15:57:41

太好了。這個地方很棒。 – user3014974

在csv文件中基於切片索引連接切片的字符串

回答

相關問題