如何查找字符串的可能組合總數？

如何查找以特定字符開頭的字符串的可能子序列的總數，如'a'並以特定字符結尾，如'b'來自給定的字符串？如何查找字符串的可能組合總數？

例：
一個字符串'aabb'，如果我們想知道有多少子序列是可能的，如果子序列必須從性格'a'開始，以字符結束'b'那麼有效的子序列可從(ab)貢獻計數通過貢獻的貢獻的指標(1,2), (ab)索引(0,3), (ab)索引(0,2), (ab)使用使用利用索引(0,2,3),(abb)使用索引(1,2,3)和aabb本身所以總是9 .I可以解決這個對於小長度的字符串，但如何解決索引(0,1,3) ,(abb)指數(0,1,2) , (aab)貢獻的索引(1,3), (aab)這個對於一個大的字符串，其中蠻力不起作用

注：我們認爲兩個子串，如果他們開始有所不同，或者在給定的字符串的不同指數結束。

def count(str,str1 ,str2): 
l = len(str) 
count=0 
for i in range(0, l+1): 
    for j in range(i+1, l+1): 
     if str[i] == str1 and str[j-1] == str2: 
      count+=1 
return count

來源

2017-10-28 Demonking28

你到目前爲止嘗試過什麼？ –

你想在這結束什麼值？你想要子串的總數，所有子串的所有索引，還是實際上所有的子串？ – Polymer

@KlausD。嘗試蠻力，但這需要很多時間 – Demonking28

之前我發表我的主要代碼，我會盡力解釋它是如何工作的。讓源字符串爲'a123b'。有效子序列由'123'前綴'b'和後綴'b'的所有子集組成。所有子集的集合稱爲powerset，而itertools文檔具有的代碼顯示如何在Itertools Recipes部分中使用combinations來生成powerset。

# Print all subsequences of '123', prefixed with 'a' and suffixed with 'b' 
from itertools import combinations 

src = '123' 
for i in range(len(src) + 1): 
    for s in combinations(src, i): 
     print('a' + ''.join(s) + 'b')

輸出

ab 
a1b 
a2b 
a3b 
a12b 
a13b 
a23b 
a123b

下面是它使用配方蠻力解決方案。

from itertools import combinations 

def count_bruteforce(src, targets): 
    c0, c1 = targets 
    count = 0 
    for i in range(2, len(src) + 1): 
     for t in combinations(src, i): 
      if t[0] == c0 and t[-1] == c1: 
       count += 1 
    return count

它可以很容易證明，the number of subsets of a set of n items is 2**n。因此，不是逐個生成子集，我們可以使用該公式加速該過程，這是我的功能所做的。

from itertools import combinations 

def count_bruteforce(src, targets): 
    c0, c1 = targets 
    count = 0 
    for i in range(2, len(src) + 1): 
     for t in combinations(src, i): 
      if t[0] == c0 and t[-1] == c1: 
       count += 1 
    return count 

def count_fast(src, targets): 
    c0, c1 = targets 
    # Find indices of the target chars 
    idx = {c: [] for c in targets} 
    for i, c in enumerate(src): 
     if c in targets: 
      idx[c].append(i) 

    idx0, idx1 = idx[c0], idx[c1] 
    count = 0 
    for u in idx0: 
     for v in idx1: 
      if v < u: 
       continue 
      # Calculate the number of valid subsequences 
      # which start at u+1 and end at v-1. 
      n = v - u - 1 
      count += 2 ** n 
    return count 

# Test 

funcs = (
    count_bruteforce, 
    count_fast, 
) 

targets = 'ab' 

data = (
    'ab', 'aabb', 'a123b', 'aacbb', 'aabbb', 
    'zababcaabb', 'aabbaaabbb', 
) 

for src in data: 
    print(src) 
    for f in funcs: 
     print(f.__name__, f(src, targets)) 
    print()

輸出

ab 
count_bruteforce 1 
count_fast 1 

aabb 
count_bruteforce 9 
count_fast 9 

a123b 
count_bruteforce 8 
count_fast 8 

aacbb 
count_bruteforce 18 
count_fast 18 

aabbb 
count_bruteforce 21 
count_fast 21 

zababcaabb 
count_bruteforce 255 
count_fast 255 

aabbaaabbb 
count_bruteforce 730 
count_fast 730

有可能有辦法更快通過在正確的地方開始新的內循環，而不是使用continue跳過不必要的索引，使這個。

來源

2017-10-28 12:34:54

可以請你看看這個問題：https：//stackoverflow.com/questions/46987669/cutting-cost-algorithm-optimization – Demonking28

容易，這應該只是字母到兩個電源的數量。即，n^2

Python實現也只是n_substrings = n ** 2

來源

2017-10-28 10:09:46 Polymer

我認爲你誤解了這個問題，子字符串必須以字符「x」開始，並以字符「y」結尾，這將作爲輸入。 – Demonking28

如何查找字符串的可能組合總數？

回答

相關問題