2016-03-31 41 views
3

我正在試圖找到一種方法來查找與dictonary內的字符串最接近的鍵。例如:用字符串在字典中找出最接近的鍵?

data = {'1a': 'This is 1a', '1d': 'This is 1d', '1f': 'This is 1f', '1e': 'This is 1e'} 
find_nearest(data, '1b') 
#This would return key '1a' 

我發現了其他的例子,但大多數處理數字。例如:

data[num] if num in data else data[min(data.keys(), key=lambda k: abs(k-num))] 

我能找到一個看起來有前途的一個代碼:

from sortedcontainers import SortedDict 
sd = SortedDict((key, value) for key, value in data) 

# Bisect for the index of the desired key. 
index = sd.bisect(200) 

# With that index, lookup the key. 
key = sd.iloc[index] 

# You can also look ahead or behind to find the nearest key. 
behind = sd.iloc[index - 1] 
ahead = sd.iloc[index + 1] 

所以我想這一點,這裏是我的代碼:

from sortedcontainers import SortedDict 
data = {'1a': 'This is 1a', '1d': 'This is 1d', '1f': 'This is 1f', '1e': 'This is 1e'} 
sd = SortedDict((key,value) for key,value in data.items()) 

index = sd.bisect('1b') 

key = sd.iloc[index] 
print(key) 

但是當我運行這段代碼它返回:

1d #Instead of '1a' 

我有tr爲了讓代碼能夠工作,每一種方式都是這樣,但我似乎無法做到。有誰知道實現這一目標的快速有效方法?

+0

bisect函數只是做bisect_right,它給你正確的下一個值,而不是最接近的值。 – Schore

+0

需要定義什麼*最接近*在您的要求意味着什麼?...比如,如果有'1a'和'1c',你認爲什麼接近?..以及你會選擇哪一個? –

回答

4

當你平分時,如果算法找不到精確的索引匹配,則有2個選擇。它可以返回左側對象的索引或右側對象的索引。它看起來像bisectbisect_right的別名。你可以使用bisect_left來代替...

當然,這不一定是更接近(你還沒有真正定義你靠近的意思)。事實上,即使像difflib.SequenceMatcher.ratio()這樣的東西可能也不會對這個例子有所幫助,因爲它只能看到匹配元素與非匹配元素的比例。

你可以嘗試這樣的:

def find_closest(sd, expected): 
    index = sd.bisect(expected) 
    closest_lower = sd.iloc[index] 
    try: 
     closest_upper = sd.iloc[index] 
    except IndexError: 
     return closest_lower 

    # assumption -- Your keys are hex values. 
    # this assumption could be completely wrong, but demonstrates 
    # how to think of defining a measure of "closeness" 
    var expected_as_int = int(expected, 16) 
    def distance(val): 
     return int(val, 16) - expected_as_int 

    return min([closest_lower, closest_upper], key=distance) 
2

的方式,我會實現,這是通過按鍵爲了迭代,並尋找具有最小的「差異化」的關鍵。因爲按鍵已排序,所以只要差異不再減小,就知道您已找到該按鍵。

def closestKey(data, val): 
    lastKey = None 
    lastDif = None 
    for key in sorted(data.keys()): 
     dif = difference(key, val) #need to figure out difference() 
     if lastDif is not None and dif > lastDif: 
      return lastKey 
     lastDif = dif 
     lastKey = key 

這並不處理兩個鍵是等距的情況,如果這很重要。

0

感謝@ mgilson,這給了我幫助的想法,我能夠做到我想實現的目標。這裏是我有興趣代碼:

from sortedcontainers import SortedDict 
data = {'1a': 'This is 1a', '1d': 'This is 1d', '1g': 'This is 1g', '1h': 'This is 1h'} 
def find_closest(sd, expected): 
    index = sd.bisect(expected) 
    try: 
     indexAhead = sd.iloc[index] 
    except IndexError: 
     indexAhead = sd.iloc[len(sd.keys()) - 1] 
    if indexAhead == expected: 
     return expected 
    else: 

     try: 
      indexBehindNum = 0 
      indexBehind = sd.iloc[index -1] 
      for char in indexBehind: 
       indexBehindNum += ord(char) 
     except IndexError: 
      pass 
     if not indexBehindNum: 
      return indexAhead 
     else: 
      expectedTotalNum = 0 
      indexAheadNum = 0 
      for char in expected: 
       expectedTotalNum += ord(char) 
      for char in indexAhead: 
       indexAheadNum += ord(char) 
      diffrenceAhead = indexAheadNum - expectedTotalNum 
      diffrenceBehind = indexBehindNum - expectedTotalNum 
      Closest = min([diffrenceAhead, diffrenceBehind], key=abs) 
      if Closest == diffrenceAhead: 
       return indexAhead 
      else: 
       return indexBehind 

sd = SortedDict((key,value) for key,value in data.items()) 

print(find_closest(sd, '1b'))#This will return '1a'! 

我不知道這是否是最快和最有效的,但我會盡力繼續努力尋求其他途徑。

相關問題