2017-03-16 94 views
1

這是一個程序,如果用戶輸入錯字,則向用戶建議玩家的姓名。這非常緩慢。我如何加快我的代碼?

首先它必須發出一個get請求,然後檢查玩家的名字是否在json數據中,如果是,傳遞。否則,需要所有玩家的名字和姓氏,並將其附加到names。然後它檢查first_namelast_name是否與使用get_close_matches的列表中的名稱非常相似。我從一開始就知道這將會非常緩慢,但必須有一個更快的方法來做到這一點,只是我不能拿出一個。有什麼建議麼?

from difflib import get_close_matches 
def suggestion(first_name, last_name): 
    names = [] 
    my_request = get_request("https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json") 

    for n in my_request['activeplayers']['playerentry']: 
     if last_name == n['player']['LastName'] and first_name == n['player']['FirstName']: 
      pass 

     else: 
      names.append(n['player']['FirstName'] + " " + n['player']['LastName']) 
      suggest = get_close_matches(first_name + " " + last_name, names) 

    return "did you mean " + "".join(suggest) + "?" 



print suggestion("mattthews ", "stafffford") #should return Matthew Stafford 
+0

您可能希望pass'改變''到其continue'告訴循環開始下一個值,然後再次運行。 –

+4

如何在本地持久保存[url](https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json)的內容以減少IO時間? –

+1

與@OldPanda一樣,實現一個緩存策略。你有訪問像memcached的東西嗎? – dana

回答

1

好,因爲它在評論中變成了我的建議制定出來的,我也可以發佈它作爲一個的答案與一些其他的想法包括在內。

首先,將您的I/O操作從函數中移出,以便每次運行函數時都不會浪費時間提出請求。相反,當你啓動腳本時,你將得到你的json並將其加載到本地內存中。如果可能的話,事先下載json數據,而不是打開文本文件可能是一個更快的選擇。

其次,您應該在每個循環中獲得一組獨特的候選項,因爲不需要多次比較它們。當一個名字被get_close_matches()丟棄時,我們知道同名不需要再次。 (如果名稱被丟棄的標準取決於後續名稱,這將是一個不同的故事,但我懷疑是這種情況。)

第三,嘗試使用批處理。鑑於get_close_matches()是合理有效的,比較而言,比方說,一次10個候選人應該不會比1慢。但是將for循環從超過100萬個元素減少到超過100K個元素是相當顯着的提升。

第四,我假設你正在檢查last_name == ['LastName'] and first_name == ['FirstName'],因爲在那種情況下,將不會有錯字。那麼,爲什麼不簡單地擺脫這個功能呢?

把它們放在一起,我可以寫,看起來像這樣的代碼:

from difflib import get_close_matches 

# I/O operation ONCE when the script is run 
my_request = get_request("https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json") 

# Creating batches of 10 names; this also happens only once 
# As a result, the script might take longer to load but run faster. 
# I'm sure there is a better way to create batches, but I'm don't know any. 
batch = [] # This will contain 10 names. 
names = [] # This will contain the batches. 

for player in my_request['activeplayers']['playerentry']: 
    name = player['FirstName'] + " " + player['LastName'] 
    batch.append(name) 

    # Obviously, if the number of names is not a multiple of 10, this won't work! 
    if len(batch) == 10: 
     names.append(batch) 
     batch = [] 

def suggest(first_name, last_name, names): 

    desired_name = first_name + " " + last_name 
    suggestions = [] 

    for batch in names: 

     # Just print the name if there is no typo 
     # Alternatively, you can create a flat list of names outside of the function 
     # and see if the desired_name is in the list of names to immediately 
     # terminate the function. But I'm not sure which method is faster. It's 
     # a quick profiling task for you, though. 
     if desired_name in batch: 
      return desired_name 

     # This way, we only match with new candidates, 10 at a time. 
     best_matches = get_close_matches(desired_name, batch) 
     suggestions.append(best_matches) 

    # We need to flatten the list of suggestions to print. 
    # Alternatively, you could use a for loop to append in the first place. 
    suggestions = [name for batch in suggestions for name in batch] 

    return "did you mean " + ", ".join(suggestions) + "?" 

print suggestion("mattthews ", "stafffford") #should return Matthew Stafford