嗯,我的挑戰看起來很簡單,但我用完了選項。所以任何幫助將不勝感激。在csv文件中基於切片索引連接切片的字符串
我有很多fasta格式的DNA序列,它們需要在特定位置切片,然後連接所得到的部分。所以,如果我的序列文件是像這樣:
~$ cat seq_file
>Sequence1
This is now a sequence that must require a bit of slicing and concatenation to be useful
>Sequence2
I have many more uncleaned strings like this in the form of sequences
我所要的輸出是這樣:
>Sequence1
This is useful
>Sequence2
I have cleaned sequences
現在片部分是由從單獨的csv文件分片索引確定。在這種情況下,切片位置被組織成這樣:
~$ cat test.csv
Sequence1,0,9,66,74,,
Sequence2,0,5,15,22,48,57
我的代碼:
from Bio import SeqIO
import csv
seq_dict = {}
for seq_record in SeqIO.parse('seq_file', 'fasta'):
descr = seq_record.description
seq_dict[descr] = seq_record.seq
with open('test.csv', 'rb') as file:
reader = csv.reader(file)
for row in reader:
seq_id = row[0]
for n in range(1,7):
if n % 2 != 0:
start = row[n] # all start positions for the slice occupy non-even rows
else:
end = row[n]
for key, value in sorted(seq_dict.iteritems()):
#print key, value
if key == string_id: # cross check matching sequence identities
try:
slice_seq = value[int(start):int(end)]
print key
print slice_seq
except ValueError:
print 'Ignore empty slice indices.. '
現在,這將打印:
Sequence1
Thisisnow
Sequence1
useful
Ignore empty slice indices..
Sequence2
Ihave
Sequence2
cleaned
Sequence2
sequences
到目前爲止好,這是我所期待的。但是,如何通過連接或連接或通過python中的任何可能操作將切片部分連接到一起以達到我想要的目的?謝謝。
我用'STR(X)'從'Seq'對象中創建字符串,因爲'join'不能與它們一起工作。 –
非常感謝。非常!! – user3014974