計算字符串中子字符串長度爲n（未給出）的x次出現次數

我無法獲得字符串中有n長度的子字符串出現次數。例如，如果字符串是計算字符串中子字符串長度爲n（未給出）的x次出現次數

CCCATGGTtaGGTaTGCCCGAGGT

，n是

輸出必須是這樣的：

'CCC' : 2, 'GGT' :3

輸入是一個列表的列表，所以我得到的埃夫裏串列表但我不能繼續前進，輸出是所有字符串的字典

代碼：

def get_all_n_repeats(n,sq_list): 
    reps={} 
    for i in sq_list: 
     if not i: 
      continue 
     else: 
      for j in i: 
       ........#Here the code I want to do#......     
return reps

來源

2016-05-29 Teshtek

爲什麼它是「GGT」而不是「GTt」？ –

你至少需要展示你已經嘗試過的東西。 – totoro

你的輸出和你的輸入沒有意義。如果你將輸入字符串分成三個字母字符串，你會得到'['CCC'，'ATG'，'GTt'，'aGG'，'TaT'，'GCC'，'CGA'，'GGT']'所以我不知道你輸出中的「GGT」在哪裏。 –

使用Counter

from collections import Counter 

def count_occurrences(input, n): 
    candidates = [] 
    for i, c in enumerate(st): 
     try: 
      candidates.append('{}{}{}'.format(st[i], st[i+1], st[i+2])) 
     except IndexError: 
      continue 

    output = {} 
    for k,v in Counter(candidates).items(): 
     if v > 1: 
      output[k] = v 

st = "CCCATGGTtaGGTaTGCCCGAGGT" 
n = 3 

count_occurrences(st, n) 
# {'GGT': 3, 'CCC': 2}

來源

2016-05-29 19:59:31 Jivan

'計數器（候選人）.most_common（）' –

一個非常明確的解決方案：

s = 'CCCATGGTtaGGTaTGCCCGAGGT' 
n = 3 
# All possible n-length strings 
l = [s[i:i + n] for i in range(len(s) - (n - 1))] 
# Count their distribution 
d = {} 
for e in l: 
    d[e] = d.get(e, 0) + 1 
print(d)

來源

2016-05-29 20:13:19 totoro

一個非常簡單的解決方案：

from collections import Counter 

st = "CCCATGGTtaGGTaTGCCCGAGGT" 
n = 3 

tokens = Counter(st[i:i+n] for i in range(len(st) - n + 1)) 
print tokens.most_common(2)

後，它是由你來使它幫手功能。

來源

2016-05-29 20:18:24

計算字符串中子字符串長度爲n（未給出）的x次出現次數

回答

相關問題