爲什麼我的二進制搜索實現非常低效？

我正在做一個Python練習來搜索word從給定的排序wordlist，包含超過100,000個單詞。爲什麼我的二進制搜索實現非常低效？

使用Python bisect module中的bisect_left時，它非常高效，但是使用我自己創建的二進制方法效率非常低。任何人都可以請說明爲什麼

這是使用Python bisect模塊的搜索方法：

def in_bisect(word_list, word): 
    """Checks whether a word is in a list using bisection search. 

    Precondition: the words in the list are sorted 

    word_list: list of strings 
    word: string 
    """ 
    i = bisect_left(word_list, word) 
    if i != len(word_list) and word_list[i] == word: 
     return True 
    else: 
     return False

我的實現真的是很低效的（不知道爲什麼）：

def my_bisect(wordlist,word): 
    """search the given word in a wordlist using 
    bisection search, also known as binary search 
    """ 
    if len(wordlist) == 0: 
     return False 
    if len(wordlist) == 1: 
     if wordlist[0] == word: 
      return True 
     else: 
      return False 

    if word in wordlist[len(wordlist)/2:]: 
     return True 

    return my_bisect(wordlist[len(wordlist)/2:],word)

來源

2014-06-22 bean

因爲你實際上並沒有使用二進制搜索？ – jonrsharpe

@jonrsharpe，我試圖執行二進制搜索，其中我搜索的開始一半，如果不是在開始一半，我搜索另一半 – bean

這裏的問題是，你在每個級別上覆制列表，這將使您從二分查找中獲得的收益變得侏儒。嘗試僅使用索引來分隔要搜索的部分。 –

if word in wordlist[len(wordlist)/2:]

將使Python搜索你的wordlist的一半，這有點擊敗編寫二進制搜索的目的。此外，你沒有正確地將列表分成一半。二進制搜索策略是將搜索空間的每一步縮小一半，然後僅將相同的策略應用於您可以使用的一半的搜索空間。爲了知道哪一半是搜索的正確搜索空間，它是關鍵是wordlist已排序。以下是一個示例實現，用於跟蹤驗證word是否在wordlist中所需的調用次數。

import random 

numcalls = 0 
def bs(wordlist, word): 
    # increment numcalls 
    print('wordlist',wordlist) 
    global numcalls 
    numcalls += 1 

    # base cases 
    if not wordlist: 
     return False 
    length = len(wordlist) 
    if length == 1: 
     return wordlist[0] == word 

    # split the list in half 
    mid = int(length/2) # mid index 
    leftlist = wordlist[:mid] 
    rightlist = wordlist[mid:] 
    print('leftlist',leftlist) 
    print('rightlist',rightlist) 
    print() 

    # recursion 
    if word < rightlist[0]: 
     return bs(leftlist, word) # word can only be in left list 
    return bs(rightlist, word) # word can only be in right list 

alphabet = 'abcdefghijklmnopqrstuvwxyz' 
wl = sorted(random.sample(alphabet, 10)) 
print(bs(wl, 'm')) 
print(numcalls)

我包含了一些print聲明，以便您可以看到發生了什麼。這裏有兩個示例輸出。第一：word是在wordlist：

wordlist ['b', 'c', 'g', 'i', 'l', 'm', 'n', 'r', 's', 'v'] 
leftlist ['b', 'c', 'g', 'i', 'l'] 
rightlist ['m', 'n', 'r', 's', 'v'] 

wordlist ['m', 'n', 'r', 's', 'v'] 
leftlist ['m', 'n'] 
rightlist ['r', 's', 'v'] 

wordlist ['m', 'n'] 
leftlist ['m'] 
rightlist ['n'] 

wordlist ['m'] 
True 
4

二：word不在wordlist：

wordlist ['a', 'c', 'd', 'e', 'g', 'l', 'o', 'q', 't', 'x'] 
leftlist ['a', 'c', 'd', 'e', 'g'] 
rightlist ['l', 'o', 'q', 't', 'x'] 

wordlist ['l', 'o', 'q', 't', 'x'] 
leftlist ['l', 'o'] 
rightlist ['q', 't', 'x'] 

wordlist ['l', 'o'] 
leftlist ['l'] 
rightlist ['o'] 

wordlist ['l'] 
False 
4

請注意，如果您雙擊單詞表的大小，即使用

wl = sorted(random.sample(alphabet, 20))

numcalls平均只會比長度爲10的wordlist高1，因爲wordlist必須再分成一半。

來源

2014-06-22 15:00:14 timgeb

我將你的bs（）函數複製到我的代碼中並用它來進行二分搜索，速度仍然比使用庫函數慢得多，不知道爲什麼？ – bean

@bean我的代碼爲每個函數調用創建新列表，您可以通過調整函數來僅僅查看'wordlist'的某些索引，而不是創建'leftlist'和'rightlist'來避免這種情況。 – timgeb

是的，我確實使用slice [len（wordlist）/ 2：]來查看單詞表的某些索引，但仍然不起作用。 – bean

搜索，如果一個詞是在單詞表簡單（蟒蛇2.7）：

def bisect_fun(listfromfile, wordtosearch): 
    bi = bisect.bisect_left(listfromfile, wordtosearch) 
    if listfromfile[bi] == wordtosearch: 
     return listfromfile[bi], bi

來源

2016-10-24 00:33:03 halfluke

爲什麼我的二進制搜索實現非常低效？

回答

相關問題