使用Python進行字符串解析？

給定一個字符串，例如'helloyellowellow'，解析給定字符串中的所有有效字符串。（例如：[[hell，hello，yellow]，[low，low] ........]使用Python進行字符串解析？

我正在尋找最優化的編寫代碼的方式。知道這是最好的辦法

全面披露 - 這是一個面試問題

master = [] 

# Dictionary for us to look up words 
def is_word(inputstr): 
    #returns True/False 


def processstring(fstr,secstr,li): 
    if is_word(fstr): 
     li.append(fstr) 
    if len(secstr) == 0: 
     if len(li) != 0: 
      master.append(li) 
     return 
    processstring(fstr+secstr[0], secstr[1:len(secstr)],li) 



def wrapperprocess(inpstr): 
    li = [] 
    if len(inpstr) == 0: 
     return 
    processstring('',inpstr,li) 
    wrapperprocess(inpstr[1:len(inpstr)]) 


wrapperprocess('helloyellowellow') 
print master

來源

2013-10-24 user2917012

在您的解決方案，看起來像你忘了'返回李'。一個更好的方法是「匹配」匹配的單詞，而不是維護一個列表，追加到它並返回它。 – shx2

你可以這樣做：

tgt='helloyellowellow' 

with open('/usr/share/dict/words') as f: 
    for word in f: 
     word=word.strip() 
     if word in tgt and len(word)>1: 
      print word

打印：

el 
ell 
he 
hell 
hello 
lo 
low 
loy 
ow 
owe 
we 
well 
ye 
yell 
yellow

如果你只是想找你已經未定義功能is_word，你可以像這樣玩：

def is_word(word, dic='/usr/share/dict/words'): 
    if not hasattr(is_word, 'words'): 
     with open(dic) as f: 
      is_word.words={word.strip() for word in f} 

    return word in is_word.words and len(word)>1

作爲默認的數據結構，巨蟒套的平均look-up time of O(1)。你不可能自己寫更快的東西。

來源

2013-10-24 18:08:58 dawg

感謝您的代碼。但是，如果您正在查閱字典中的每個單詞以匹配您的字符串，效率如何？如果只有一小部分匹配，你會不會進行數百萬次比賽？ – user2917012

在這種情況下，'高效'是什麼？在我的（老，慢）計算機上，這在88毫秒內執行。只需在Python中打印'hello'需要22 ms，所以在60 ms以上時，它的速度非常快。每次只有一個字在內存中，所以它的內存效率非常高。由於我花了大約30秒的時間來編寫代碼，這非常有效。您希望以何種方式提高效率？ ;-) – dawg

這是一個與解決好的問題，

使用Wordnet包，

在解析您指定的字符串開始與一些指數，並保持折磨你的索引值每一個增量上的索引，檢查是否存在使用wordnet的同一個詞，它會告訴你天氣特定的子字符串是一個有意義的或不是！

要安裝wordnet：

https://pypi.python.org/pypi/Wordnet-bn/1.0

來源

2013-10-24 18:12:41

既然你提到你正在尋找一種有效的算法，假設你得到的字典提前（而不是僅僅作爲一個可調用的謂語），可以使用Aho–Corasick算法。

當然，如果輸入的文本短，一個更幼稚的算法會更快，以避免「昂貴」的字典預處理。

另外，替代蟒蛇回答：這裏有一個簡單的方法來簡單地檢查每個子：

def gen_words(txt): 
    n = len(txt) 
    for i in range(n): 
     for j in range(i+1, n+1): 
      subtxt = txt[i:j] 
      if is_word(subtxt): 
       yield subtxt

的唯一性，這樣做：

all_words = set(gen_words(txt))

來源

2013-10-24 18:12:59 shx2

使用Python進行字符串解析？

回答

相關問題