2011-02-05 88 views
3

像umbellar =傘都是相同的單詞。找到所有字符與python中的其他詞匹配的單詞

Input = [「umbellar」,「goa」,「傘」,「ago」,「aery」,「alem」,「ayre」,「gnu」,「eyra」,「egma」 ,「leam」,「amel」,「year」,「meal」,「yare」,「gun」,「alme」,「ung」,「male」,「lame」,「mela」,「mage」]

所以輸出應爲:

輸出= [ [ 「umbellar」, 「傘」], [ 「前」, 「果阿」], [ 「丙烯酸酯」, 「艾爾」, 「eyra」 ,「yare」,「year」], [「alem」,「alme」,「amel」,「lame」,「leam」,「male」,「meal」,「mela」] [「gnu」 ,「gun」,「ung」] [「egma」,「遊戲」,「法師」], ]

+2

它這個功課?如果是這樣,那麼標記它。 –

+1

假設一個相同的單詞必須具有相同的長度,然後對列表中的每個字符串進行排序並檢查匹配。 –

回答

4

他們不是平等的話,他們是anagrams。

字謎可以通過文字整理髮現:

sorted('umbellar') == sorted('umbrella') 
7

from itertools import groupby 

def group_words(word_list): 
    sorted_words = sorted(word_list, key=sorted) 
    grouped_words = groupby(sorted_words, sorted) 
    for key, words in grouped_words: 
     group = list(words) 
     if len(group) > 1: 
      yield group 

例子:

>>> group_words(["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu","eyra","egma","game","leam","amel","year","meal","yare","gun","alme","ung","male","lame","mela","mage" ]) 
<generator object group_words at 0x0297B5F8> 
>>> list(_) 
[['umbellar', 'umbrella'], ['egma', 'game', 'mage'], ['alem', 'leam', 'amel', 'meal', 'alme', 'male', 'lame', 'mela'], ['aery', 'ayre', 'eyra', 'year', 'yare'], ['goa', 'ago'], ['gnu', 'gun', 'ung']] 
+1

'[list(g)for k,g in itertools.groupby(sorted(INPUT,key = sorted),sorted)]' – Kabie

+0

@Kabie:我使用臨時變量來幫助可讀性。 :) 此外,如果我們要通過這個例子,沒有anagrams的單詞不應該返回。 – shang

0

正如其他人指出你是在爲你的列表尋找字謎的所有組話。這裏你有一個可能的解決方案。該算法查找候選對象並選擇一個(第一個元素)作爲規範詞,並將其餘的詞作爲可能詞刪除,因爲anagrams是可傳遞的,並且一旦您發現某個詞屬於anagram組,則不需要重新計算它。

input = ["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu", 
"eyra","egma","game","leam","amel","year","meal","yare","gun", 
"alme","ung","male","lame","mela","mage" ] 
res = dict() 
for word in input: 
    res[word]=[word] 
for word in input: 
    #the len test is just to avoid sorting and comparing words of different len 
    candidates = filter(lambda x: len(x) == len(word) and\ 
            sorted(x) == sorted(word),res.keys()) 
    if len(candidates): 
     canonical = candidates[0] 
     for c in candidates[1:]: 
      #we delete all candidates expect the canonical/ 
      del res[c] 
      #we add the others to the canonical member 
      res[canonical].append(c) 
print res.values() 

這algth輸出...

[['year', 'ayre', 'aery', 'yare', 'eyra'], ['umbellar', 'umbrella'], 
['lame', 'leam', 'mela', 'amel', 'alme', 'alem', 'male', 'meal'], 
['goa', 'ago'], ['game', 'mage', 'egma'], ['gnu', 'gun', 'ung']] 
1

collections.defaultdict就派上用場了:

from collections import defaultdict 

input = ["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu", 
"eyra","egma","game","leam","amel","year","meal","yare","gun", 
"alme","ung","male","lame","mela","mage" ] 

D = defaultdict(list) 
for i in input: 
    key = ''.join(sorted(input)) 
    D[key].append(i) 

output = D.values() 

和輸出是[['umbellar', 'umbrella'], ['goa', 'ago'], ['gnu', 'gun', 'ung'], ['alem', 'leam', 'amel', 'meal', 'alme', 'male', 'lame', 'mela'], ['egma', 'game', 'mage'], ['aery', 'ayre', 'eyra', 'year', 'yare']]

0

上的答案是正確的.... ..但我已經被挑戰做同樣的事情,而不使用....'groupby()'....... 這..... 添加打印語句將幫助你在調試代碼和運行時輸出....

def group_words(word_list): 
    global new_list 
    list1 = [] 
    _list0 = [] 
    _list1 = [] 
    new_list = [] 
    for elm in word_list: 
     list_elm = list(elm) 
     list1.append(list(list_elm)) 
    for ee in list1: 
     ee = sorted(ee) 
     ee = ''.join(ee) 
     _list1.append(ee) 
    _list1 = list(set(_list1)) 
    for _e1 in _list1: 
     for e0 in word_list: 
      if len(e0) == len(_e1): 
       list_e0 = ''.join(sorted(e0)) 
       if _e1 == list_e0: 
        _list0.append(e0) 
        _list0 = list(_list0) 
     new_list.append(_list0) 
     _list0 = [] 
    return new_list 

和輸出

[['umbellar', 'umbrella'], ['goa', 'ago'], ['gnu', 'gun', 'ung'], ['alem', 'leam', 'amel', 'meal', 'alme', 'male', 'lame', 'mela'], ['egma', 'game', 'mage'], ['aery', 'ayre', 'eyra', 'year', 'yare']] 
相關問題