我想在python中創建redis緩存,並且作爲任何自我尊重的科學家,我做了一個基準測試以測試性能。緩存應用程序中Redis與磁盤的性能
有趣的是,redis沒有那麼好。 Python正在做一些神奇的事情(存儲文件),或者我的redis版本非常慢。
我不知道這是因爲我的代碼的結構或方式,但我期待着redis做得比它更好。
爲了製作redis緩存,我將我的二進制數據(本例中爲HTML頁面)設置爲從文件名派生的密鑰,過期時間爲5分鐘。
在所有情況下,使用f.read()完成文件處理(這比f.readlines()快大約3倍,並且我需要二進制blob)。
我的比較中是否有某些東西缺失,或者Redis真的不匹配磁盤? Python是否將文件緩存到某處,並且每次都重新訪問它?爲什麼這比訪問redis更快?
我在64位Ubuntu系統上使用redis 2.8,python 2.7和redis-py。
我不認爲Python做任何特別神奇的事情,因爲我做了一個函數將文件數據存儲在一個python對象中並永久生成它。
我有四個函數調用,我歸類:
讀文件X次
被調用,看看是否Redis的對象仍然在內存中,加載它,或緩存的新文件(單一的函數和多個redis實例)。
一個函數,用於創建一個生成redis數據庫結果的生成器(具有單個和多個redis實例)。
最後,將文件存儲在內存中並永久生成它。
import redis
import time
def load_file(fp, fpKey, r, expiry):
with open(fp, "rb") as f:
data = f.read()
p = r.pipeline()
p.set(fpKey, data)
p.expire(fpKey, expiry)
p.execute()
return data
def cache_or_get_gen(fp, expiry=300, r=redis.Redis(db=5)):
fpKey = "cached:"+fp
while True:
yield load_file(fp, fpKey, r, expiry)
t = time.time()
while time.time() - t - expiry < 0:
yield r.get(fpKey)
def cache_or_get(fp, expiry=300, r=redis.Redis(db=5)):
fpKey = "cached:"+fp
if r.exists(fpKey):
return r.get(fpKey)
else:
with open(fp, "rb") as f:
data = f.read()
p = r.pipeline()
p.set(fpKey, data)
p.expire(fpKey, expiry)
p.execute()
return data
def mem_cache(fp):
with open(fp, "rb") as f:
data = f.readlines()
while True:
yield data
def stressTest(fp, trials = 10000):
# Read the file x number of times
a = time.time()
for x in range(trials):
with open(fp, "rb") as f:
data = f.read()
b = time.time()
readAvg = trials/(b-a)
# Generator version
# Read the file, cache it, read it with a new instance each time
a = time.time()
gen = cache_or_get_gen(fp)
for x in range(trials):
data = next(gen)
b = time.time()
cachedAvgGen = trials/(b-a)
# Read file, cache it, pass in redis instance each time
a = time.time()
r = redis.Redis(db=6)
gen = cache_or_get_gen(fp, r=r)
for x in range(trials):
data = next(gen)
b = time.time()
inCachedAvgGen = trials/(b-a)
# Non generator version
# Read the file, cache it, read it with a new instance each time
a = time.time()
for x in range(trials):
data = cache_or_get(fp)
b = time.time()
cachedAvg = trials/(b-a)
# Read file, cache it, pass in redis instance each time
a = time.time()
r = redis.Redis(db=6)
for x in range(trials):
data = cache_or_get(fp, r=r)
b = time.time()
inCachedAvg = trials/(b-a)
# Read file, cache it in python object
a = time.time()
for x in range(trials):
data = mem_cache(fp)
b = time.time()
memCachedAvg = trials/(b-a)
print "\n%s file reads: %.2f reads/second\n" %(trials, readAvg)
print "Yielding from generators for data:"
print "multi redis instance: %.2f reads/second (%.2f percent)" %(cachedAvgGen, (100*(cachedAvgGen-readAvg)/(readAvg)))
print "single redis instance: %.2f reads/second (%.2f percent)" %(inCachedAvgGen, (100*(inCachedAvgGen-readAvg)/(readAvg)))
print "Function calls to get data:"
print "multi redis instance: %.2f reads/second (%.2f percent)" %(cachedAvg, (100*(cachedAvg-readAvg)/(readAvg)))
print "single redis instance: %.2f reads/second (%.2f percent)" %(inCachedAvg, (100*(inCachedAvg-readAvg)/(readAvg)))
print "python cached object: %.2f reads/second (%.2f percent)" %(memCachedAvg, (100*(memCachedAvg-readAvg)/(readAvg)))
if __name__ == "__main__":
fileToRead = "templates/index.html"
stressTest(fileToRead)
而現在的結果:
10000 file reads: 30971.94 reads/second
Yielding from generators for data:
multi redis instance: 8489.28 reads/second (-72.59 percent)
single redis instance: 8801.73 reads/second (-71.58 percent)
Function calls to get data:
multi redis instance: 5396.81 reads/second (-82.58 percent)
single redis instance: 5419.19 reads/second (-82.50 percent)
python cached object: 1522765.03 reads/second (4816.60 percent)
的結果是在一)發電機有趣的是比調用函數每次,B)Redis的比從磁盤讀取速度較慢速度更快,以及c)從python對象中讀取的速度非常快。
爲什麼從磁盤讀取比從redis讀取內存文件要快得多?
編輯: 一些更多的信息和測試。
我更換了功能
data = r.get(fpKey)
if data:
return r.get(fpKey)
結果並沒有從
if r.exists(fpKey):
data = r.get(fpKey)
Function calls to get data using r.exists as test
multi redis instance: 5320.51 reads/second (-82.34 percent)
single redis instance: 5308.33 reads/second (-82.38 percent)
python cached object: 1494123.68 reads/second (5348.17 percent)
Function calls to get data using if data as test
multi redis instance: 8540.91 reads/second (-71.25 percent)
single redis instance: 7888.24 reads/second (-73.45 percent)
python cached object: 1520226.17 reads/second (5132.01 percent)
創建的每個函數調用一個新的Redis實例卻沒有對讀取速度noticable影響太大差別,從測試到測試的可變性大於增益。
Sripathi Krishnan建議實現隨機文件讀取。正如我們從這些結果中看到的那樣,這就是緩存開始真正幫助的地方。
Total number of files: 700
10000 file reads: 274.28 reads/second
Yielding from generators for data:
multi redis instance: 15393.30 reads/second (5512.32 percent)
single redis instance: 13228.62 reads/second (4723.09 percent)
Function calls to get data:
multi redis instance: 11213.54 reads/second (3988.40 percent)
single redis instance: 14420.15 reads/second (5157.52 percent)
python cached object: 607649.98 reads/second (221446.26 percent)
有變化的文件數量巨大讀取這樣的百分比差別不是加速的良好指標。
Total number of files: 700
40000 file reads: 1168.23 reads/second
Yielding from generators for data:
multi redis instance: 14900.80 reads/second (1175.50 percent)
single redis instance: 14318.28 reads/second (1125.64 percent)
Function calls to get data:
multi redis instance: 13563.36 reads/second (1061.02 percent)
single redis instance: 13486.05 reads/second (1054.40 percent)
python cached object: 587785.35 reads/second (50214.25 percent)
我用random.choice(的fileList)隨機地選擇一個新的文件在每個通過的功能。
完整的要點是這裏如果有人想嘗試一下 - https://gist.github.com/3885957
編輯編輯: 沒有意識到,我在呼喚一個單一的文件的生成(雖然函數調用和發電機的性能非常相似)。這是來自發生器的不同文件的結果。
Total number of files: 700
10000 file reads: 284.48 reads/second
Yielding from generators for data:
single redis instance: 11627.56 reads/second (3987.36 percent)
Function calls to get data:
single redis instance: 14615.83 reads/second (5037.81 percent)
python cached object: 580285.56 reads/second (203884.21 percent)
我沒有看到你在每個函數調用中創建一個新的redis實例。這只是默認的參數嗎? – jdi
是的,如果你沒有通過redis實例,函數調用將獲取一個新的緩存def_cache_or_get(fp,expiry = 300,r = redis.Redis(db = 5)): – MercuryRising
這實際上並不正確。這些默認參數僅在加載腳本時評估一次,並且與函數定義一起保存。每次你打電話時都不會評估它們。這就解釋了爲什麼你沒有看到任何傳遞之間的差異或讓它使用默認的區別。其實你正在做的是爲每個函數def創建一個,加上每次你傳遞一個。2未使用的連接 – jdi