好,因爲它在評論中變成了我的建議制定出來的,我也可以發佈它作爲一個的答案與一些其他的想法包括在內。
首先,將您的I/O操作從函數中移出,以便每次運行函數時都不會浪費時間提出請求。相反,當你啓動腳本時,你將得到你的json並將其加載到本地內存中。如果可能的話,事先下載json數據,而不是打開文本文件可能是一個更快的選擇。
其次,您應該在每個循環中獲得一組獨特的候選項,因爲不需要多次比較它們。當一個名字被get_close_matches()
丟棄時,我們知道同名不需要再次。 (如果名稱被丟棄的標準取決於後續名稱,這將是一個不同的故事,但我懷疑是這種情況。)
第三,嘗試使用批處理。鑑於get_close_matches()
是合理有效的,比較而言,比方說,一次10個候選人應該不會比1慢。但是將for
循環從超過100萬個元素減少到超過100K個元素是相當顯着的提升。
第四,我假設你正在檢查last_name == ['LastName'] and first_name == ['FirstName']
,因爲在那種情況下,將不會有錯字。那麼,爲什麼不簡單地擺脫這個功能呢?
把它們放在一起,我可以寫,看起來像這樣的代碼:
from difflib import get_close_matches
# I/O operation ONCE when the script is run
my_request = get_request("https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json")
# Creating batches of 10 names; this also happens only once
# As a result, the script might take longer to load but run faster.
# I'm sure there is a better way to create batches, but I'm don't know any.
batch = [] # This will contain 10 names.
names = [] # This will contain the batches.
for player in my_request['activeplayers']['playerentry']:
name = player['FirstName'] + " " + player['LastName']
batch.append(name)
# Obviously, if the number of names is not a multiple of 10, this won't work!
if len(batch) == 10:
names.append(batch)
batch = []
def suggest(first_name, last_name, names):
desired_name = first_name + " " + last_name
suggestions = []
for batch in names:
# Just print the name if there is no typo
# Alternatively, you can create a flat list of names outside of the function
# and see if the desired_name is in the list of names to immediately
# terminate the function. But I'm not sure which method is faster. It's
# a quick profiling task for you, though.
if desired_name in batch:
return desired_name
# This way, we only match with new candidates, 10 at a time.
best_matches = get_close_matches(desired_name, batch)
suggestions.append(best_matches)
# We need to flatten the list of suggestions to print.
# Alternatively, you could use a for loop to append in the first place.
suggestions = [name for batch in suggestions for name in batch]
return "did you mean " + ", ".join(suggestions) + "?"
print suggestion("mattthews ", "stafffford") #should return Matthew Stafford
您可能希望pass'改變''到其continue'告訴循環開始下一個值,然後再次運行。 –
如何在本地持久保存[url](https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json)的內容以減少IO時間? –
與@OldPanda一樣,實現一個緩存策略。你有訪問像memcached的東西嗎? – dana