2010-04-27 158 views
8

最近在面試時我得到了以下的問題:單字母遊戲問題?

  1. 編寫能夠在命令行中運行的蟒蛇腳本

  2. 它應該在命令行上兩個單詞(或可選的話,如果你希望它可以查詢用戶通過控制檯提供這兩個單詞)。

  3. 鑑於這兩個詞: a。確保它們的長度相同 b。確保它們都是您下載的英語 有效字詞典中的字詞。

  4. 如果是這樣,通過以下一系列步驟計算是否可以從第一個字符到第二個字 a。您可以一次更換一個字母 b。每次更改字母時,結果詞也必須存在於詞典 c中。你不能添加或刪除字母

  5. 如果兩個單詞都可以訪問,腳本應該打印出從一個單詞到另一個單詞最短路徑的路徑。

  6. 你可以爲你的單詞詞典/ usr/share/dict/words。

我的解決方案包括使用廣度優先搜索找到兩個單詞之間的最短路徑。但顯然這還不夠好,得到這份工作:(

請問你們知道我可能做錯了?謝謝你這麼多。

import collections 
import functools 
import re 

def time_func(func): 
    import time 

    def wrapper(*args, **kwargs): 
     start = time.time() 
     res = func(*args, **kwargs) 
     timed = time.time() - start 

     setattr(wrapper, 'time_taken', timed) 
     return res 

    functools.update_wrapper(wrapper, func) 
    return wrapper 

class OneLetterGame: 
    def __init__(self, dict_path): 
     self.dict_path = dict_path 
     self.words = set() 

    def run(self, start_word, end_word): 
     '''Runs the one letter game with the given start and end words. 
     ''' 
     assert len(start_word) == len(end_word), \ 
      'Start word and end word must of the same length.' 

     self.read_dict(len(start_word)) 

     path = self.shortest_path(start_word, end_word) 
     if not path: 
      print 'There is no path between %s and %s (took %.2f sec.)' % (
       start_word, end_word, find_shortest_path.time_taken) 
     else: 
      print 'The shortest path (found in %.2f sec.) is:\n=> %s' % (
       self.shortest_path.time_taken, ' -- '.join(path)) 

    def _bfs(self, start): 
     '''Implementation of breadth first search as a generator. 

     The portion of the graph to explore is given on demand using get_neighboors. 
     Care was taken so that a vertex/node is explored only once. 
     ''' 
     queue = collections.deque([(None, start)]) 
     inqueue = set([start]) 

     while queue: 
      parent, node = queue.popleft() 
      yield parent, node 

      new = set(self.get_neighbours(node)) - inqueue 
      inqueue = inqueue | new 
      queue.extend([(node, child) for child in new]) 

    @time_func 
    def shortest_path(self, start, end): 
     '''Returns the shortest path from start to end using bfs. 
     ''' 
     assert start in self.words, 'Start word not in dictionnary.' 
     assert end in self.words, 'End word not in dictionnary.' 

     paths = {None: []} 
     for parent, child in self._bfs(start): 
      paths[child] = paths[parent] + [child] 
      if child == end: 
       return paths[child] 
     return None 

    def get_neighbours(self, word): 
     '''Gets every word one letter away from the a given word. 

     We do not keep these words in memory because bfs accesses 
     a given vertex only once. 
     ''' 
     neighbours = [] 

     p_word = ['^' + word[0:i] + '\w' + word[i+1:] + '$' 
      for i, w in enumerate(word)] 
     p_word = '|'.join(p_word) 

     for w in self.words: 
      if w != word and re.match(p_word, w, re.I|re.U): 
       neighbours += [w] 
     return neighbours 

    def read_dict(self, size): 
     '''Loads every word of a specific size from the dictionnary into memory. 
     ''' 
     for l in open(self.dict_path): 
      l = l.decode('latin-1').strip().lower() 
      if len(l) == size: 
       self.words.add(l) 

if __name__ == '__main__': 
    import sys 
    if len(sys.argv) not in [3, 4]: 
     print 'Usage: python one_letter_game.py start_word end_word' 
    else: 
     g = OneLetterGame(dict_path = '/usr/share/dict/words') 
     try: 
      g.run(*sys.argv[1:]) 
     except AssertionError, e: 
      print e 

謝謝所有偉大的我認爲真正讓我感到這樣一個事實是,我每次遍歷字典中的所有單詞以考慮可能的單詞鄰居,相反,我可以使用Duncan和Matt Anderson指出的倒排索引。也很有幫助,非常感謝,現在我知道我做錯了什麼,

這裏是相同的代碼與反向索引:

import collections 
import functools 
import re 

def time_func(func): 
    import time 

    def wrapper(*args, **kwargs): 
     start = time.time() 
     res = func(*args, **kwargs) 
     timed = time.time() - start 

     setattr(wrapper, 'time_taken', timed) 
     return res 

    functools.update_wrapper(wrapper, func) 
    return wrapper 

class OneLetterGame: 
    def __init__(self, dict_path): 
     self.dict_path = dict_path 
     self.words = {} 

    def run(self, start_word, end_word): 
     '''Runs the one letter game with the given start and end words. 
     ''' 
     assert len(start_word) == len(end_word), \ 
      'Start word and end word must of the same length.' 

     self.read_dict(len(start_word)) 

     path = self.shortest_path(start_word, end_word) 
     if not path: 
      print 'There is no path between %s and %s (took %.2f sec.)' % (
       start_word, end_word, self.shortest_path.time_taken) 
     else: 
      print 'The shortest path (found in %.2f sec.) is:\n=> %s' % (
       self.shortest_path.time_taken, ' -- '.join(path)) 

    def _bfs(self, start): 
     '''Implementation of breadth first search as a generator. 

     The portion of the graph to explore is given on demand using get_neighboors. 
     Care was taken so that a vertex/node is explored only once. 
     ''' 
     queue = collections.deque([(None, start)]) 
     inqueue = set([start]) 

     while queue: 
      parent, node = queue.popleft() 
      yield parent, node 

      new = set(self.get_neighbours(node)) - inqueue 
      inqueue = inqueue | new 
      queue.extend([(node, child) for child in new]) 

    @time_func 
    def shortest_path(self, start, end): 
     '''Returns the shortest path from start to end using bfs. 
     ''' 
     assert self.in_dictionnary(start), 'Start word not in dictionnary.' 
     assert self.in_dictionnary(end), 'End word not in dictionnary.' 

     paths = {None: []} 
     for parent, child in self._bfs(start): 
      paths[child] = paths[parent] + [child] 
      if child == end: 
       return paths[child] 
     return None 

    def in_dictionnary(self, word): 
     for s in self.get_steps(word): 
      if s in self.words: 
       return True 
     return False 

    def get_neighbours(self, word): 
     '''Gets every word one letter away from the a given word. 
     ''' 
     for step in self.get_steps(word): 
      for neighbour in self.words[step]: 
       yield neighbour 

    def get_steps(self, word): 
     return (word[0:i] + '*' + word[i+1:] 
      for i, w in enumerate(word)) 

    def read_dict(self, size): 
     '''Loads every word of a specific size from the dictionnary into an inverted index. 
     ''' 
     for w in open(self.dict_path): 
      w = w.decode('latin-1').strip().lower() 
      if len(w) != size: 
       continue 
      for step in self.get_steps(w): 
       if step not in self.words: 
        self.words[step] = [] 
       self.words[step].append(w) 

if __name__ == '__main__': 
    import sys 
    if len(sys.argv) not in [3, 4]: 
     print 'Usage: python one_letter_game.py start_word end_word' 
    else: 
     g = OneLetterGame(dict_path = '/usr/share/dict/words') 
     try: 
      g.run(*sys.argv[1:]) 
     except AssertionError, e: 
      print e 

和定時比較:(在 91.57秒找到)

%蟒one_letter_game_old.py快樂 你好最短路徑是:
=>快樂 - 哈爾皮 - 豎琴 - 哈特 - 暫停 - 大廳 - 地獄 - 你好

%python one_letter_game.py happy hello最短路徑(在。1.71秒):
=>快樂 - 哈比 - 豎琴 - 哈茨 - 停止 - 廳 - 地獄 - 你好

+7

我沒經過你的代碼,而只是因爲你沒有得到這份工作並不意味着這是你的錯誤。他們有告訴你嗎? – MJB 2010-04-27 13:21:32

+0

以及我試圖問,但他們的政策是,「他們不被允許提供進一步的反饋」... – 2010-04-27 13:25:41

+0

類似問題:http://stackoverflow.com/questions/2534087/successive-adding-of-char-to-得到最長的詞在詞典關閉 – FogleBird 2010-04-27 13:27:23

回答

10

我不會說你的解決方案是,但它是一個有點慢。有兩個原因。

  1. 廣度優先搜索將訪問長度比一個需要更短的所有路徑,再加上一些對所有的需要,纔可以給你一個答案長度的路徑。最佳優先搜索(A *)理想情況下會跳過最不相關的路徑。

  2. 您每次查找鄰居時,都會檢查字典中的每個詞作爲候選人作爲鄰居。正如鄧肯所說,你可以建立一個數據結構來主要查找鄰居而不是搜索它們。

這裏是另一個解決問題的方法:

import collections 
import heapq 
import time 

def distance(start, end): 
    steps = 0 
    for pos in range(len(start)): 
     if start[pos] != end[pos]: 
      steps += 1 
    return steps 


class SearchHeap(object): 
    def __init__(self): 
     self.on_heap = set() 
     self.heap = [] 

    def push(self, distance, word, path): 
     if word in self.on_heap: 
      return 
     self.on_heap.add(word) 
     heapq.heappush(self.heap, ((distance, len(path)), word, path)) 

    def __len__(self): 
     return len(self.heap) 

    def pop(self): 
     return heapq.heappop(self.heap) 


class OneLetterGame(object): 
    _word_data = None 

    def __init__(self, dict_path): 
     self.dict_path = dict_path 

    def run(self, start_word, end_word): 
     start_time = time.time() 
     self._word_data = collections.defaultdict(list) 
     if len(start_word) != len(end_word): 
      print 'words of different length; no path' 
      return 

     found_start, found_end = self._load_words(start_word, end_word) 
     if not found_start: 
      print 'start word %r not found in dictionary' % start_word 
      return 
     if not found_end: 
      print 'end word %r not found in dictionary' % end_word 
      return 

     search_start_time = time.time() 
     path = self._shortest_path(start_word, end_word) 
     search_time = time.time() - search_start_time 
     print 'search time was %.4f seconds' % search_time 

     if path: 
      print path 
     else: 
      print 'no path found from %r to %r' % (start_word, end_word) 

     run_time = time.time() - start_time 
     print 'total run time was %.4f seconds' % run_time 

    def _load_words(self, start_word, end_word): 
     found_start, found_end = False, False 
     length = len(start_word) 
     with open(self.dict_path) as words: 
      for word in words: 
       word = word.strip() 
       if len(word) == length: 
        if start_word == word: found_start = True 
        if end_word == word: found_end = True 
        for bucket in self._buckets_for(word): 
         self._word_data[bucket].append(word) 
     return found_start, found_end 

    def _shortest_path(self, start_word, end_word): 
     heap = SearchHeap() 
     heap.push(distance(start_word, end_word), start_word, (start_word,)) 
     while len(heap): 
      dist, word, path = heap.pop() 
      if word == end_word: 
       return path 
      for neighbor in self._neighbors_of(word): 
       heap.push(
        distance(neighbor, end_word), 
        neighbor, 
        path + (neighbor,)) 
     return() 

    def _buckets_for(self, word): 
     buckets = [] 
     for pos in range(len(word)): 
      front, back = word[:pos], word[pos+1:] 
      buckets.append(front+'*'+back) 
     return buckets 

    def _neighbors_of(self, word): 
     for bucket in self._buckets_for(word): 
      for word in self._word_data[bucket]: 
       yield word 

if __name__ == '__main__': 
    import sys 
    if len(sys.argv) not in [3, 4]: 
     print 'Usage: python one_letter_game.py start_word end_word' 
    else: 
     matt = OneLetterGame(dict_path = '/usr/share/dict/words') 
     matt.run(*sys.argv[1:]) 

而且,定時比較:

% python /tmp/one_letter_alex.py canoe happy 
The shortest path (found in 51.98 sec.) is: 
=> canoe -- canon -- caxon -- taxon -- taxor -- taxer -- taper -- paper -- papey -- pappy -- happy 

% python /tmp/one_letter_matt.py canoe happy 
search time was 0.0020 seconds 
('canoe', 'canon', 'caxon', 'taxon', 'taxor', 'taxer', 'taper', 'paper', 'papey', 'pappy', 'happy') 
total run time was 0.2416 seconds 
1

也許他們預計A *搜索與編輯距離作爲估計?

0

也許你忘了添加shebang? > - |

或者他們只是不喜歡你的編碼風格......例如,我不會爲這樣一個簡單的問題創建一個類,它是過度設計解決方案(雖然我不是那麼挑剔當然只是基於這個招聘決策)。

1

也許你不想在這樣的混蛋公司工作。我個人不相信代碼評論。我認爲,如果您在檢查投資組合和過去的參考資料時做了足夠好的工作,在現場代碼測試中不需要這樣做。像這些嚴格的政策的公司是那些從來沒有做到這一點的公司,因爲他們所僱用的只是一個正在思考代碼24/7的追蹤代碼書呆子。只是我2美分。

+5

這是一篇諷刺性的文章嗎?我真的不知道。 – 2010-04-27 15:29:47

+0

你爲什麼收費很諷刺? – jini 2010-04-27 18:27:57

+1

因爲「單軌代碼書呆子」是讓項目成功的人......而且很大程度上是那些像這樣訪問網站的人 – 2010-04-30 15:46:33

3

我同意,期望您對此編程測試的回答是他們選擇其他人的唯一原因會很奇怪,但實際上您的代碼存在問題。您可以通過詞典對路徑的每一步或每條潛在路徑進行線性搜索。這可能需要很長時間才能製作大字典和很多潛在的路徑。另外它很明顯,你沒有徹底測試它,因爲它沒有路徑時會失敗。

如果我編碼這個,我會創建一個字典,當加載的話將刪除線性搜索,讓你直接挑出下一個單詞很好。此代碼是不完整的解決方案,但應說明我的意思:

words = {} 

def get_keys(word): 
    """Generate keys from a word by replacing each letter in turn by an asterisk""" 
    for i in range(len(word)): 
     yield word[:i]+'*'+word[i+1:] 

def get_steps(word): 
    """Find the set of all words that can link to the given word.""" 
    steps = [] 
    for key in get_keys(word): 
     steps.extend(words[key]) 
    steps = set(steps) - set([word]) 
    return steps 

# Load the dictionary 
for word in ('start', 'stare', 'spare', 'spore'): 
    for key in get_keys(word): 
     if key not in words: 
      words[key] = [] 
     words[key].append(word) 

print(words) 
print(get_steps('stare'))