閱讀所有可能的順序串在Python

如果我有字母，如列表：
word = ['W','I','N','E']
並需要獲得子的每一個可能的序列，長度爲3或更低，例如：
W I N E, WI N E, WI NE, W IN E, WIN E等
什麼是最有效的方式去做這件事？閱讀所有可能的順序串在Python

現在，我有：

word = ['W','I','N','E'] 
for idx,phon in enumerate(word): 
    phon_seq = "" 
    for p_len in range(3): 
     if idx-p_len >= 0: 
      phon_seq = " ".join(word[idx-(p_len):idx+1]) 
      print(phon_seq)

這只是給了我下面的，而不是子序列：

W 
I 
W I 
N 
I N 
W I N 
E 
N E 
I N E

我只是無法弄清楚如何創造一切可能的序列。

來源

2014-11-06 Adam_G

你需要排列嗎？或只是子字符串？ – 2014-11-06 23:35:55

只是子串，因爲它們需要是順序的。 – 2014-11-06 23:39:06

是不是你正在尋找只是「酒」與每個可能的位置內的空間？ – Stuart 2014-11-06 23:43:58

試試這個遞歸算法：

def segment(word): 
    def sub(w): 
    if len(w) == 0: 
     yield [] 
    for i in xrange(1, min(4, len(w) + 1)): 
     for s in sub(w[i:]): 
     yield [''.join(w[:i])] + s 
    return list(sub(word)) 

# And if you want a list of strings: 
def str_segment(word): 
    return [' '.join(w) for w in segment(word)]

輸出：

>>> segment(word) 
[['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']] 

>>> str_segment(word) 
['W I N E', 'W I NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E']

來源

2014-11-06 23:48:28 irrelephant

謝謝！這將做到這一點。 – 2014-11-06 23:57:28

@Adam_G我改進了答案 - 您應該使用此版本！ – irrelephant 2014-11-07 00:16:30

謝謝。這是如何改善它？ – 2014-11-07 01:18:03

我對這個問題的實現。

#!/usr/bin/env python 

# this is a problem of fitting partitions in the word 
# we'll use itertools to generate these partitions 
import itertools 

word = 'WINE' 

# this loop generates all possible partitions COUNTS (up to word length) 
for partitions_count in range(1, len(word)+1): 
    # this loop generates all possible combinations based on count 
    for partitions in itertools.combinations(range(1, len(word)), r=partitions_count): 

     # because of the way python splits words, we only care about the 
     # difference *between* partitions, and not their distance from the 
     # word's beginning 
     diffs = list(partitions) 
     for i in xrange(len(partitions)-1): 
      diffs[i+1] -= partitions[i] 

     # first, the whole word is up for taking by partitions 
     splits = [word] 

     # partition the word's remainder (what was not already "taken") 
     # with each partition 
     for p in diffs: 
      remainder = splits.pop() 
      splits.append(remainder[:p]) 
      splits.append(remainder[p:]) 

     # print the result 
     print splits

來源

2014-11-06 23:33:10

我不確定這是否正確。這給出了排列，但我正在尋找所有的子串。換句話說，一切可能的方法來切斷這個詞。 – 2014-11-06 23:39:59

這確實會給出所有子字符串並返回與我的答案相同的結果（除了列表而不是字符串）。 – Stuart 2014-11-07 01:20:32

作爲一個備選答案，你可以用itertools模塊做到這一點，利用groupby功能分組列表，也我使用combination爲分組鍵創建一對索引列表：（i<=word.index(x)<=j），最後使用set獲取唯一列表。

另外請注意，您可以通過這種方法，當你有對像(i1,j1) and (i2,j2)如果i1==0 and j2==3和j1==i2像(0,2) and (2,3)這意味着這些切片結果是相同的，你需要刪除其中的一個在第一次拿到的配對索引的獨特組合。

所有在一個列表理解：

subs=[[''.join(i) for i in j] for j in [[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in list(combinations(range(len(word)),2))]] 
set([' '.join(j) for j in subs]) # set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])

演示在細節：

>>> cl=list(combinations(range(len(word)),2)) 
>>> cl 
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)] 

>>> new_l=[[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in cl] 
>>> new_l 
[[['W', 'I'], ['N', 'E']], [['W', 'I', 'N'], ['E']], [['W', 'I', 'N', 'E']], [['W'], ['I', 'N'], ['E']], [['W'], ['I', 'N', 'E']], [['W', 'I'], ['N', 'E']]] 
>>> last=[[''.join(i) for i in j] for j in new_l] 
>>> last 
[['WI', 'NE'], ['WIN', 'E'], ['WINE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'NE']] 
>>> set([' '.join(j) for j in last]) 
set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE']) 
>>> for i in set([' '.join(j) for j in last]): 
... print i 
... 
WIN E 
W IN E 
W INE 
WI NE 
WINE 
>>>

來源

2014-11-07 00:11:37 Kasramvd

這不起作用 - 應該有7個可能的輸出組合。 – Stuart 2014-11-07 00:46:23

由於有可以在每個三個位置（的空間或沒有後W後，我和之後N）時，可以將其視爲類似於位數爲1或0的二進制表示，範圍從1到2^3 - 1。

input_word = "WINE" 
for variation_number in xrange(1, 2 ** (len(input_word) - 1)): 
    output = '' 
    for position, letter in enumerate(input_word): 
     output += letter 
     if variation_number >> position & 1: 
      output += ' ' 
    print output

編輯：要僅包含3個字符或更少的序列的變體（在一般情況下，input_word可能超過4個字符），我們可以排除二進制表示在一行中包含3個零的情況。（我們也從一個較大的數字，以排除其會在一開始有000案件開始的範圍內。）

for variation_number in xrange(2 ** (len(input_word) - 4), 2 ** (len(input_word) - 1)): 
    if not '000' in bin(variation_number): 
     output = '' 
     for position, letter in enumerate(input_word): 
      output += letter 
      if variation_number >> position & 1: 
       output += ' ' 
     print output

來源

2014-11-07 00:13:22 Stuart

不幸的是，這個算法沒有推廣到更長的輸入單詞，因爲它打印長度爲4或更長的子字符串。試着用'input_word =「SWINE」'來看看我的意思。 – irrelephant 2014-11-07 00:31:16

@irrelephant它適用於我與SWINE。不確定你的意思是關於子串長度。 – Stuart 2014-11-07 00:37:04

當我嘗試它時，我得到'SWIN E'這一行 - SWIN長度不小於3。部分OP的問題限制了子串的最大長度爲3. – irrelephant 2014-11-07 00:38:12

我認爲它可能是這樣的：字=「ABCDE」 myList中= [ ]

for i in range(1, len(word)+1,1): 
    myList.append(word[:i]) 

    for j in range(len(word[len(word[1:]):]), len(word)-len(word[i:]),1): 
     myList.append(word[j:i]) 

print(myList) 
print(sorted(set(myList), key=myList.index)) 
return myList

來源

2017-09-09 17:03:33

格式化建議：我猜'word'和'myList'應該在代碼塊中。另外，你能解釋一下這個答案給出的東西在其他答案中找不到嗎？ – stealththeninja 2017-09-09 17:22:30

閱讀所有可能的順序串在Python

回答

相關問題