在字符串列表中，在字符串中查找一個短語並將兩個整數（x..y）以字符串形式附加到列表中。 Python

所以我想寫一個腳本，讀取一個文件，並提取2個值，一旦發現某個單詞。在這種情況下，當遇到字符串'exon'時，它會保存後面的兩個整數。在字符串列表中，在字符串中查找一個短語並將兩個整數（x..y）以字符串形式附加到列表中。 Python

我開始通過創建空字符串：

exon_start = [] 
exon_end = []

這裏是我使用簡化數據的一個示例：

for line in data: 
    print data 

>>> 

exon   1..35 
       /gene="CDKN1A" 

CDS    73..567 
       /translation="MSEPAGDVRQNPCGSKACRRLFGPVDSEQLSRDCDALMAGCIQE 
       ARERWNFDFVTETPLEGDFAWERVRGLGLPKLYLPTGPRRGRDELGGGRRPGTSPALL 
       QGTAEEDHVDLSLSCTLVPRSGEQAEGSPGGPGDSQGRKRRQTSMTDFYHSKRRLIFS 
       KRKP" 

misc_feature 76..78 
       /gene="CDKN1A" 


exon   518..2106 
       /gene="CDKN1A"

我嘗試導入的正則表達式模塊用於re.findall（）功能：

indx_exon = range(0,len(data)) 

# so this relates each line of the data to a specific number in the index

我無法識別每個人中的'外顯子'短語雙線第一我只是試圖找出其中的文本行有外顯子序列，看是否re.findall（）在工作，我把：

for p,line in zip(indx_line,data): 

    if re.findall(r'exon',line) is True: 
     print p

和我無

當我放：

for p,line in zip(indx_line,data): 

    exon_test = re.findall(r'exon',line) 
    print exon_test

我有一大堆的[]對於不包含「外顯子」的線條和線條的確實包含「外顯子」他們給了我「外顯子」。所以我知道我可以使用re.findall（）功能來查找每個字符串中'外顯子'的每個出現次數

我只需要找出我究竟該如何發現「外顯子」它需要查找該行，直到找到「..」，然後將其側面的整數附加到其相應的列表中;即

exon_start = [1,518] 
exon_end = [35,2106]

來源

2012-09-23 O.rka

問題在於if re.findall(r'exon',line) is True:一行。因爲re.finall()不會返回True或False。例如：

>>> mystr = '123 exon' 
>>> import re 
>>> re.findall(r'exon', mystr) 
['exon'] 
>>> re.findall(r'exon', mystr) is True 
False 
>>> bool(re.findall(r'exon',mystr)) 
True 
>>> if re.findall(r'exon', mystr): 
...  print 'true' 
... 
true

改變原有的代碼：

for p,line in zip(indx_line,data): 

    if re.findall(r'exon',line): 
     print p

應該使其工作。

編輯：@TimPietzcker指出的那樣，你不需要使用re在所有的這種情況。並解決你獲得側翼..數的第二個問題，這裏是代碼，可能會有所幫助：

>>> line = ' exon   1..35' 
>>> if 'exon' in line: 
...  ranges = line.split()[1].split('..') 
...  print ranges 
... 
['1', '35']

來源

2012-09-23 22:12:31

@TimPietzcker啊，是的，你是絕對正確的。除非「exon」是他的簡單例子，否則不需要're'。 –

雅它的工作，但我怎麼追加側面的'..'在每一行的值？ –

@ draconisthe0ry您可以簡單地使用'split（）'來解析它，就像我剛剛在更新後的答案中所示。 –

在字符串列表中，在字符串中查找一個短語並將兩個整數（x..y）以字符串形式附加到列表中。 Python

回答

相關問題