在一個dict.txt文件

前綴字符串二進制搜索我有一個測試工具，出於某種原因，這個代碼是在發現前綴失敗，也忽略了短詞。任何建議/提示/想法？在一個dict.txt文件

def search(str): 
    """Search for a prefix string in the dictionary. 
    Args: 
     str: A string to look for in the dictionary 
    Returns: 
     code WORD if str exactly matches a word in the dictionary, 
      PREFIX if str does not match a word exactly but is a prefix 
       of a word in the dictionary, or 
     NO_MATCH if str is not a prefix of any word in the dictionary 
    """ 

    left = 0 
    right = len(dict) - 1 
    mid = (left + right) // 2 
    elem = dict[mid] 
    while right >= left: 
     if elem == str: 
      return WORD 
     elif elem < str: 
      left = mid + 1 
      mid = (left + right) // 2 
     elif elem > str: 
      right = mid - 1 
      mid = (left + right) // 2 
     elif elem == str[0:len(elem)]: 
      return PREFIX 
     elem = dict[mid] 
     #print(left, right, mid) 

    return NO_MATCH

來源

2012-11-02 Colter Therrell

我認爲前綴總是會小於一個完整的字符串，所以最後elif不會被擊中 – noisecapella

dict（）是內建的，喲你應該避免命名覆蓋內建函數的變量。和str（）以及.. – monkut

不太確定需要什麼，但滑動窗口對於這些搜索很有用。

from itertools import islice 


def window(seq, n=2): 
    "Returns a sliding window (of width n) over data from the iterable" 
    " s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...     " 
    it = iter(seq) 
    result = tuple(islice(it, n)) 
    if len(result) == n: 
     yield result  
    for elem in it: 
     result = result[1:] + (elem,) 
     yield result 


WORD = "Found word!" 
PREFIX = "Found prefix!" 
NO_MATCH = "No match found!"  

def search(search_for, search_in): 
    assert len(search_for) < len(search_in) 

    window_size = len(search_for) 
    total_length = len(search_in) 
    search_window = window(search_in, len(search_for)) 

    for idx, search_group in enumerate(search_window, window_size): 
     joined_str = "".join(search_group) 
     if joined_str == search_for: 
      # found match, determine if there is any left 
      if idx < total_length: 
       return PREFIX 
      elif idx == total_length: 
       return WORD    
    return NO_MATCH

來源

2012-11-02 03:45:28 monkut

left = 0 
right = len(dict) - 1 
elem = dict[mid] 
while right >= left: 
    mid = (left + right) // 2 #compute one time 
    if elem == str: 
     return WORD 
    elif elem < str: 
     left = mid + 1 
    elif elem > str: 
     right = mid - 1 
    elif elem == str[0:len(elem)]: 
     return PREFIX 
    elem = dict[mid] 
    #print(left, right, mid) 

return NO_MATCH

你不必計算，並在IFS中期，但字典是如何構成的？和多少個字符必須是前綴，給一些更多的信息，所以我們可以幫助更多。

來源

2012-11-02 03:47:38 Gofoboso

考慮dict.txt存在的內容：

a 
aa 
aaa 
aaaa 
aaaaa

你搜索詞「AAC」和mid發生轉動的aaa。

在一個標準的二進制搜索時，搜索空間變：

aaaa 
aaaaa

而且aa也不a，兩者都可以是前綴，永遠不會被發現。

我想你想將需要更爲複雜的算法是什麼。如果你打算將它基於二進制搜索，我可能會用str的最短長度開始（單個字符，在最壞的情況），並逐步延長它，因爲它找到匹配。

雖然我認爲，如果你把它包都在同一個左 - 右 - 旋轉環這將是最有效的，你甚至可以這樣做只是這樣的：

def search_prefix(str): 
    longest_prefix = NO_MATCH 
    for n in range(len(str)): 
     prefix = search(str[:n]) 
     if prefix == NO_MATCH: 
      break 
     longest_prefix = prefix 
    return longest_prefix

注意：我說單字符在最壞的情況，但在現實中，你可以預先緩存dict.txt的字長（因此前綴長度），像這樣：

prefix_lengths = sorted(set(map(len, dict)))

來源

2012-11-02 03:55:48 antak

在一個dict.txt文件

回答

相關問題