將pickle py2移植到py3字符串變成字節

我有一個用python 2.7創建的pickle文件，我試圖移植到python 3.6。該文件通過pickle.dumps(self.saved_objects, -1)將pickle py2移植到py3字符串變成字節

保存在py 2.7中，並通過loads(data, encoding="bytes")（從rb模式下打開的文件）在python 3.6中加載。如果我嘗試在r模式下打開，並通過encoding=latin1到loads我得到UnicodeDecode錯誤。當我打開它作爲它加載的字節流時，但實際上每個字符串現在是一個字節字符串。每個對象的__dict__鍵都是b"a_variable_name"，因此在調用an_object.a_variable_name時會生成屬性錯誤，因爲__getattr__傳遞了一個字符串而__dict__僅包含字節。我覺得我已經嘗試了各種論點和醃製協議的組合。除了強制性地將所有對象的__dict__鍵轉換爲字符串之外，我不知所措。有任何想法嗎？

** 跳轉到17年4月28日更新更好的例子

-------------------------- -------------------------------------------------- ---------------------------------

** 更新17年4月27日

這個最小的例子說明了我的問題：

從白翎2.7.13

import pickle 

class test(object): 
    def __init__(self): 
     self.x = u"test ¢" # including a unicode str breaks things 

t = test() 
dumpstr = pickle.dumps(t) 

>>> dumpstr 
"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."

從白翎3.6.1

import pickle 

class test(object): 
    def __init__(self): 
     self.x = "xyz" 

dumpstr = b"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb." 

t = pickle.loads(dumpstr, encoding="bytes") 

>>> t 
<__main__.test object at 0x040E3DF0> 
>>> t.x 
Traceback (most recent call last): 
    File "<pyshell#15>", line 1, in <module> 
    t.x 
AttributeError: 'test' object has no attribute 'x' 
>>> t.__dict__ 
{b'x': 'test ¢'} 
>>>

-------------------- -------------------------------------------------- ---------------------------------------

更新4/28/17

要重新創建我的問題我張貼的泡菜文件是在蟒蛇2.7.13創建了實際的原始數據鹹菜here

，窗口10使用

with open("raw_data.pkl", "wb") as fileobj: 
    pickle.dump(library, fileobj, protocol=0)

（協議0所以它的人可讀）

要運行它，你將需要classes.py

# classes.py 

class Library(object): pass 


class Book(object): pass 


class Student(object): pass 


class RentalDetails(object): pass

而測試腳本在這裏：

# load_pickle.py 
import pickle, sys, itertools, os 

raw_pkl = "raw_data.pkl" 
is_py3 = sys.version_info.major == 3 

read_modes = ["rb"] 
encodings = ["bytes", "utf-8", "latin-1"] 
fix_imports_choices = [True, False] 
files = ["raw_data_%s.pkl" % x for x in range(3)] 


def py2_test(): 
    with open(raw_pkl, "rb") as fileobj: 
     loaded_object = pickle.load(fileobj) 
     print("library dict: %s" % (loaded_object.__dict__.keys())) 
     return loaded_object 


def py2_dumps(): 
    library = py2_test() 
    for protcol, path in enumerate(files): 
     print("dumping library to %s, protocol=%s" % (path, protcol)) 
     with open(path, "wb") as writeobj: 
      pickle.dump(library, writeobj, protocol=protcol) 


def py3_test(): 
    # this test iterates over the different options trying to load 
    # the data pickled with py2 into a py3 environment 
    print("starting py3 test") 
    for (read_mode, encoding, fix_import, path) in itertools.product(read_modes, encodings, fix_imports_choices, files): 
     py3_load(path, read_mode=read_mode, fix_imports=fix_import, encoding=encoding) 


def py3_load(path, read_mode, fix_imports, encoding): 
    from traceback import print_exc 
    print("-" * 50) 
    print("path=%s, read_mode = %s fix_imports = %s, encoding = %s" % (path, read_mode, fix_imports, encoding)) 
    if not os.path.exists(path): 
     print("start this file with py2 first") 
     return 
    try: 
     with open(path, read_mode) as fileobj: 
      loaded_object = pickle.load(fileobj, fix_imports=fix_imports, encoding=encoding) 
      # print the object's __dict__ 
      print("library dict: %s" % (loaded_object.__dict__.keys())) 
      # consider the test a failure if any member attributes are saved as bytes 
      test_passed = not any((isinstance(k, bytes) for k in loaded_object.__dict__.keys())) 
      print("Test %s" % ("Passed!" if test_passed else "Failed")) 
    except Exception: 
     print_exc() 
     print("Test Failed") 
    input("Press Enter to continue...") 
    print("-" * 50) 


if is_py3: 
    py3_test() 
else: 
    # py2_test() 
    py2_dumps()

把所有3在同一目錄並運行c:\python27\python load_pickle.py第一，這將創建一個爲每個的3個協議1個鹹菜文件。然後用python 3運行相同的命令，並注意它的版本將__dict__鍵轉換爲字節。我已經工作了大約6個小時，但在我的生活中，我無法弄清楚我是如何再次破壞它的。

來源

2017-04-27 user2682863

你有沒有試過utf-8？ – zondo

yeah，ut8，utf16，latin1 ,, cp – user2682863

如果你的泡菜是在一個文件中，爲什麼你使用'loads'而不是'load'？ – BrenBarn

問題：移植鹹菜PY2到PY3字符串變成字節

低於預定的encoding='latin-1'，是確定的。
您的問題b''是使用encoding='bytes'的結果。這將導致dict-keys被取消爲字節而不是str。

問題數據是datetime.date values '\x07á\x02\x10'，從開始raw-data.pkl。

這是一個konwn問題，正如已經指出的那樣。
Unpickling python2 datetime under python3
http://bugs.python.org/issue22005

有關解決方法，我已經修補pickle.py並得到unpickled object，例如

book.library.books [0] .rentals [0] = .rental_date 2017年2月16日

這將爲我工作：

t = pickle.loads(dumpstr, encoding="latin-1")

輸出：
< main在0xf7095fec。測試對象>
噸.__字典__ = { 'X'： '測試¢'}
測試¢

測試與Python：3.4.2

來源

2017-04-27 15:20:36 stovfl

感謝您的回答，我更新了問題，以更準確地反映我的問題 – user2682863

@ user268：我會嘗試它，並返回我的結果。你爲什麼不使用'協議版本2'？ – stovfl

萬一有人擔心執行他們從互聯網上下載的一些隨機醃製數據 – user2682863

在短，您將在RentalDetails對象中使用datetime.date對象創建bug 22005對象。

這可以算出與周圍的encoding='bytes'參數，但可以讓您的教學班，__dict__包含字節：

>>> library = pickle.loads(pickle_data, encoding='bytes') 
>>> dir(library) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
TypeError: '<' not supported between instances of 'str' and 'bytes'

這是可能的手動修復根據您的具體數據：

def fix_object(obj): 
    """Decode obj.__dict__ containing bytes keys""" 
    obj.__dict__ = dict((k.decode("ascii"), v) for k, v in obj.__dict__.items()) 


def fix_library(library): 
    """Walk all library objects and decode __dict__ keys""" 
    fix_object(library) 
    for student in library.students: 
      fix_object(student) 
    for book in library.books: 
      fix_object(book) 
      for rental in book.rentals: 
        fix_object(rental)

但是，這是脆弱的，你應該尋找更好的選擇。

1）實施__getstate__/__setstate__是datetime對象映射到一個非破碎表示，例如：

class Event(object): 
    """Example class working around datetime pickling bug""" 

    def __init__(self): 
      self.date = datetime.date.today() 

    def __getstate__(self): 
      state = self.__dict__.copy() 
      state["date"] = state["date"].toordinal() 
      return state 

    def __setstate__(self, state): 
      self.__dict__.update(state) 
      self.date = datetime.date.fromordinal(self.date)

2）不要用泡菜在所有。沿着__getstate__/__setstate__的路線，您可以在類中實現to_dict/from_dict方法或類似的方法，將其內容保存爲json或其他普通格式。

最後一個注意事項是，不應要求在每個對象中反向引用庫。

來源

2017-04-29 17:58:57

我縮小了它的日期時間錯誤，並且我正在閱讀不同的錯誤報告。有什麼方法可以在某些物品無法取出時從鹹菜中獲得更好的回溯？爲了方便訪問，對圖書館的引用就在那裏。我認爲__getstate __/__ setstate__解決方案可能是理想的。 – user2682863

泡菜的問題確實很難調試，這與安全問題一起是一個很好的理由，通過一個不同的機制串行化，儘可能地。 –

最後一個問題，你將在實現get/state狀態後使用哪種編碼？ – user2682863

您應該將pickle數據視爲特定於創建它的Python（主要）版本。

（見Gregory Smith's message w.r.t. issue 22005。）

來解決這個問題，最好的辦法是寫一個Python 2.7程序讀取醃製數據，並在一箇中立的格式寫出來。

快速查看您的實際數據，在我看來，SQLite數據庫適合作爲交換格式，因爲Book包含對Library和RentalDetails的引用。你可以爲每個創建單獨的表。

來源

2017-04-29 18:46:12

將pickle py2移植到py3字符串變成字節

回答

相關問題