由1 - 優文庫

我試圖寫下面的程序的切片的行和列遞增：由1

import numpy as np #import package for scientific computing 
    dna1 = str(np.load('dna1.npy'))  
    def count(dna1, repeat): 
    i = 0 
    for s in range(len(dna1)): 
     if (s =='repeat'): 
      i += 1 
      s += dna1[0:1] 
     return i 
    repeat = 'TTTT' 
    n = count(dna1, repeat) 
    print ('{repeat} occurs {n} times in dna1'.format(repeat=repeat, n=n))

我想提取的4個字母每一個可能的組合列表，檢查它們是否相等'TTTT'。但我不知道如何遞增，以便在列表中移動1位，但仍然讀取4個字母。

來源

2016-10-22 Hotaru

你能顯示你的數據集摘錄嗎？很難確切地知道如何在不知道結構的情況下做出循環。 – rumdrums

當然！ DNA1 =「TAGCAGAAGTTGTCTCATGGACTGTATAACTCTTGCTACGCTTATTACTTTCAAACCTCCTTTGGAATGTATTTGGGCTCTAAAAATCGCCCTGAGTGACTCCAGTATATCAATTTACTCTGTTTGTCATATCTGCAGACTTGCAATACTATTCAAGCTGATAATAGAAAGTAGGGGCTATAACGACTTTTCTCACCACTGACATTGTACCCTAGTATTCAATACTAATAGGTCCGCTATATTAGATCTAAAATGCATATT ......」它的推移和 – Hotaru

首先，必須有一些Python匹配功能，谷歌「正則表達式匹配功能蟒」，其次，在psudo碼我會做：循環整個字符串，對於str [i]中的每個字母，匹配子字符串i..i + 3和「TTTT」。 – shinzou

我同意，試圖用一個正則表達式也許是最簡單的初步做法：

import numpy as np #import package for scientific computing 
import re 

dna1 = str(np.load('dna1.npy'))  

def count(dna1, repeat): 
    regex = re.compile(repeat) 
    result = regex.findall(dna1) 
    return len(result) 

repeat = 'TTTT' 
n = count(dna1, repeat) 
print ('{repeat} occurs {n} times in dna1'.format(repeat=repeat, n=n))

編輯：

下面是不使用正則表達式模塊的簡單的方法 - 你肯定能做了一些優化，以跳過前面基於每個迭代的結果：

def count(dna1, repeat): 
    repeat_length = len(repeat) 
    total = 0 
    idx = 0 
    while idx < len(dna1): 
     substr = dna1[idx:idx+repeat_length] 
     if substr == repeat: 
     total += 1 
     idx += repeat_length # skip ahead to avoid repeat counting 
     else: 
     idx += 1 
    return total

來源

2016-10-22 18:09:31 rumdrums

謝謝！它以這種方式工作，但有沒有辦法增加一個切片呢？還是它太複雜了？ – Hotaru

我添加了一個自己的方法，應該也能工作。正如Jonathan在他的回答中指出的那樣，雖然這樣做需要您考慮標準lib模塊可能會自動考慮的一些特殊情況。 – rumdrums

做到這一點是這樣的最好的和最可定製的方式：

import numpy as np # import package for scientific computing 

dna1 = str(np.load('dna1.npy')) 
repeat = 'TTTT' 

def get_num_of_repeats(dna, repeat): 
    repeats = 0 
    for i in range(len(dna) - len(repeat) + 1): 
     if dna[i:i+len(repeat)] == repeat: 
      repeats += 1 
    return repeats 

repeats = get_num_of_repeats(dna1, repeat) 
print ('{repeat} occurs {n} times in dna1'.format(repeat=repeat, n=repeats))

我只是創建了一個函數get_num_of_repeats，它要求dna變量和模式來監視並返回重複次數。根據您希望算法的運作方式，當您在尋找諸如'TTTT'之類的模式並且部分dna具有'TTTTT'時，事情可能會變得困難。我可以給你後續的幫助來定義所需的行爲。

來源

2016-10-22 18:24:15 Jonathan

由1

回答

相關問題