2014-11-06 34 views
4

如果我有字母,如列表:
word = ['W','I','N','E']
並需要獲得子的每一個可能的序列,長度爲3或更低,例如:
W I N E, WI N E, WI NE, W IN E, WIN E
什麼是最有效的方式去做這件事?閱讀所有可能的順序串在Python

現在,我有:

word = ['W','I','N','E'] 
for idx,phon in enumerate(word): 
    phon_seq = "" 
    for p_len in range(3): 
     if idx-p_len >= 0: 
      phon_seq = " ".join(word[idx-(p_len):idx+1]) 
      print(phon_seq) 

這只是給了我下面的,而不是子序列:

W 
I 
W I 
N 
I N 
W I N 
E 
N E 
I N E 

我只是無法弄清楚如何創造一切可能的序列。

+0

你需要排列嗎?或只是子字符串? – 2014-11-06 23:35:55

+1

只是子串,因爲它們需要是順序的。 – 2014-11-06 23:39:06

+1

是不是你正在尋找只是「酒」與每個可能的位置內的空間? – Stuart 2014-11-06 23:43:58

回答

2

試試這個遞歸算法:

def segment(word): 
    def sub(w): 
    if len(w) == 0: 
     yield [] 
    for i in xrange(1, min(4, len(w) + 1)): 
     for s in sub(w[i:]): 
     yield [''.join(w[:i])] + s 
    return list(sub(word)) 

# And if you want a list of strings: 
def str_segment(word): 
    return [' '.join(w) for w in segment(word)] 

輸出:

>>> segment(word) 
[['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']] 

>>> str_segment(word) 
['W I N E', 'W I NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E'] 
+0

謝謝!這將做到這一點。 – 2014-11-06 23:57:28

+1

@Adam_G我改進了答案 - 您應該使用此版本! – irrelephant 2014-11-07 00:16:30

+0

謝謝。這是如何改善它? – 2014-11-07 01:18:03

1

我對這個問題的實現。

#!/usr/bin/env python 

# this is a problem of fitting partitions in the word 
# we'll use itertools to generate these partitions 
import itertools 

word = 'WINE' 

# this loop generates all possible partitions COUNTS (up to word length) 
for partitions_count in range(1, len(word)+1): 
    # this loop generates all possible combinations based on count 
    for partitions in itertools.combinations(range(1, len(word)), r=partitions_count): 

     # because of the way python splits words, we only care about the 
     # difference *between* partitions, and not their distance from the 
     # word's beginning 
     diffs = list(partitions) 
     for i in xrange(len(partitions)-1): 
      diffs[i+1] -= partitions[i] 

     # first, the whole word is up for taking by partitions 
     splits = [word] 

     # partition the word's remainder (what was not already "taken") 
     # with each partition 
     for p in diffs: 
      remainder = splits.pop() 
      splits.append(remainder[:p]) 
      splits.append(remainder[p:]) 

     # print the result 
     print splits 
+1

我不確定這是否正確。這給出了排列,但我正在尋找所有的子串。換句話說,一切可能的方法來切斷這個詞。 – 2014-11-06 23:39:59

+0

這確實會給出所有子字符串並返回與我的答案相同的結果(除了列表而不是字符串)。 – Stuart 2014-11-07 01:20:32

1

作爲一個備選答案,你可以用itertools模塊做到這一點,利用groupby功能分組列表,也我使用combination爲分組鍵創建一對索引列表:(i<=word.index(x)<=j),最後使用set獲取唯一列表。

另外請注意,您可以通過這種方法,當你有對像(i1,j1) and (i2,j2)如果i1==0 and j2==3j1==i2(0,2) and (2,3)這意味着這些切片結果是相同的,你需要刪除其中的一個在第一次拿到的配對索引的獨特組合。

所有在一個列表理解:

subs=[[''.join(i) for i in j] for j in [[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in list(combinations(range(len(word)),2))]] 
set([' '.join(j) for j in subs]) # set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE']) 

演示在細節:

>>> cl=list(combinations(range(len(word)),2)) 
>>> cl 
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)] 

>>> new_l=[[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in cl] 
>>> new_l 
[[['W', 'I'], ['N', 'E']], [['W', 'I', 'N'], ['E']], [['W', 'I', 'N', 'E']], [['W'], ['I', 'N'], ['E']], [['W'], ['I', 'N', 'E']], [['W', 'I'], ['N', 'E']]] 
>>> last=[[''.join(i) for i in j] for j in new_l] 
>>> last 
[['WI', 'NE'], ['WIN', 'E'], ['WINE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'NE']] 
>>> set([' '.join(j) for j in last]) 
set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE']) 
>>> for i in set([' '.join(j) for j in last]): 
... print i 
... 
WIN E 
W IN E 
W INE 
WI NE 
WINE 
>>> 
+0

這不起作用 - 應該有7個可能的輸出組合。 – Stuart 2014-11-07 00:46:23

2

由於有可以在每個三個位置(的空間或沒有後W後,我和之後N)時,可以將其視爲類似於位數爲1或0的二進制表示,範圍從1到2^3 - 1。

input_word = "WINE" 
for variation_number in xrange(1, 2 ** (len(input_word) - 1)): 
    output = '' 
    for position, letter in enumerate(input_word): 
     output += letter 
     if variation_number >> position & 1: 
      output += ' ' 
    print output 

編輯:要僅包含3個字符或更少的序列的變體(在一般情況下,input_word可能超過4個字符),我們可以排除二進制表示在一行中包含3個零的情況。 (我們也從一個較大的數字,以排除其會在一開始有000案件開始的範圍內。)

for variation_number in xrange(2 ** (len(input_word) - 4), 2 ** (len(input_word) - 1)): 
    if not '000' in bin(variation_number): 
     output = '' 
     for position, letter in enumerate(input_word): 
      output += letter 
      if variation_number >> position & 1: 
       output += ' ' 
     print output 
+0

不幸的是,這個算法沒有推廣到更長的輸入單詞,因爲它打印長度爲4或更長的子字符串。試着用'input_word =「SWINE」'來看看我的意思。 – irrelephant 2014-11-07 00:31:16

+0

@irrelephant它適用於我與SWINE。不確定你的意思是關於子串長度。 – Stuart 2014-11-07 00:37:04

+0

當我嘗試它時,我得到'SWIN E'這一行 - SWIN長度不小於3。部分OP的問題限制了子串的最大長度爲3. – irrelephant 2014-11-07 00:38:12

0

我認爲它可能是這樣的: 字=「ABCDE」 myList中= [ ]

for i in range(1, len(word)+1,1): 
    myList.append(word[:i]) 

    for j in range(len(word[len(word[1:]):]), len(word)-len(word[i:]),1): 
     myList.append(word[j:i]) 

print(myList) 
print(sorted(set(myList), key=myList.index)) 
return myList 
+0

格式化建議:我猜'word'和'myList'應該在代碼塊中。另外,你能解釋一下這個答案給出的東西在其他答案中找不到嗎? – stealththeninja 2017-09-09 17:22:30