如何從日誌文件加載所有cPickle轉儲？

我將要運行的代碼會將大量（〜1000）相對較小（字符串的50個鍵值對）字典寫入日誌文件。我將通過一個可以自動執行此操作的程序來完成此操作。我想喜歡運行的命令：如何從日誌文件加載所有cPickle轉儲？

import random 
import string 
import cPickle as pickle 
import zlib 

fieldNames = ['AICc','Npix','Nparameters','DoF','chi-square','chi-square_nu'] 

tempDict = {} 
overview = {} 
iterList = [] 

# Create example dictionary to add to the log. 
for item in fieldNames: 
    tempDict[item] = random.choice([random.uniform(2,5), '', ''.join([random.choice(string.lowercase) for x in range(5)])]) 

# Compress and pickle and add the example dictionary to the log. 
# tried with 'ab' and 'wb' 
# is .p.gz the right extension for this kind of file?? 
# with open('google.p.gz', 'wb') as fp: 
with open('google.p.gz', 'ab') as fp: 
    fp.write(zlib.compress(pickle.dumps(tempDict, pickle.HIGHEST_PROTOCOL),9)) 

# Attempt to read in entire log 
i = 0 
with open('google.p.gz', 'rb') as fp: 
    # Call pickle.loads until all dictionaries loaded. 
    while 1: 
    try:  
     i += 1 
     iterList.append(i) 
     overview[i] = {} 
     overview[i] = pickle.loads(zlib.decompress(fp.read())) 
    except: 
     break 

print tempDict 
print overview

我希望能夠加載寫入日誌文件（google.p.gz）最後的字典，但它目前只加載第pickle.dump。

另外，有沒有更好的方法來做我所做的一切？我四處搜尋，感覺就像我是唯一一個做這樣的事情，我發現這是過去的一個不好的跡象。

來源

2012-09-12 JBWhitmore

您的輸入和輸出不匹配。當你輸出你的記錄，你把每個單獨的記錄，鹹菜它，壓縮它，並分別把結果寫入到文件中：

fp.write(zlib.compress(pickle.dumps(tempDict, pickle.HIGHEST_PROTOCOL),9))

但是當你輸入你的紀錄，你讀的整個文件，解壓縮它，並從中unpickle一個對象：

pickle.loads(zlib.decompress(fp.read()))

所以，下次你打電話時fp.read()什麼都不剩：你看過整個文件中的第一次。

所以你必須匹配你的輸入到你的輸出。如何做到這一點取決於您的具體要求。假設您的要求如下：

會有這麼多記錄，以至於需要在磁盤上壓縮文件。
所有記錄一次寫入文件（您不需要追加單個記錄）。
您不需要隨機訪問文件中的記錄（您將始終樂意閱讀整個文件以獲取最後一條記錄）。

有了這些要求，用zlib分別壓縮每個記錄是一個壞主意。 zlib使用的DEFLATE algorithm通過查找重複序列進行工作，因此適用於大量數據。對於單個記錄來說，它不會有太大的作用。所以讓我們使用gzip模塊來壓縮和解壓縮整個文件。

我在對其中的代碼進行了一些其他改進。

import cPickle as pickle 
import gzip 
import random 
import string 

field_names = 'AICc Npix Nparameters DoF chi-square chi-square_nu'.split() 

random_value_constructors = [ 
    lambda: random.uniform(2,5), 
    lambda: ''.join(random.choice(string.lowercase) 
        for x in xrange(random.randint(0, 5)))] 

def random_value(): 
    """ 
    Return a random value, either a small floating-point number or a 
    short string. 
    """ 
    return random.choice(random_value_constructors)() 

def random_record(): 
    """ 
    Create and return a random example record. 
    """ 
    return {name: random_value() for name in field_names} 

def write_records(filename, records): 
    """ 
    Pickle each record in `records` and compress them to `filename`. 
    """ 
    with gzip.open(filename, 'wb') as f: 
     for r in records: 
      pickle.dump(r, f, pickle.HIGHEST_PROTOCOL) 

def read_records(filename): 
    """ 
    Decompress `filename`, unpickle records from it, and yield them. 
    """ 
    with gzip.open(filename, 'rb') as f: 
     while True: 
      try: 
       yield pickle.load(f) 
      except EOFError: 
       return

來源

2012-09-12 11:28:37

如何從日誌文件加載所有cPickle轉儲？

回答

相關問題