2016-11-27 55 views
2

我跑在jupyter筆記本蟒蛇下面的代碼:加載數據集

# Run some setup code for this notebook. 

import random 
import numpy as np 
from cs231n.data_utils import load_CIFAR10 
import matplotlib.pyplot as plt 

# This is a bit of magic to make matplotlib figures appear inline in the notebook 
# rather than in a new window. 
%matplotlib inline 
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots 
plt.rcParams['image.interpolation'] = 'nearest' 
plt.rcParams['image.cmap'] = 'gray' 

# Some more magic so that the notebook will reload external python modules; 
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython 
%load_ext autoreload 
%autoreload 2 

,然後下面的說明:

# Load the raw CIFAR-10 data. 
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' 
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) 

# As a sanity check, we print out the size of the training and test data. 
print ('Training data shape: ', X_train.shape) 
print ('Training labels shape: ', y_train.shape) 
print ('Test data shape: ', X_test.shape) 
print ('Test labels shape: ', y_test.shape) 

通過運行第2部分,我是個提示以下錯誤:

--------------------------------------------------------------------------- 
UnicodeDecodeError      Traceback (most recent call last) 
<ipython-input-5-9506c06e646a> in <module>() 
     1 # Load the raw CIFAR-10 data. 
     2 cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' 
----> 3 X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) 
     4 
     5 # As a sanity check, we print out the size of the training and test data. 

C:\Users\lenovo\assignment1\cs231n\data_utils.py in load_CIFAR10(ROOT) 
    20 for b in range(1,6): 
    21  f = os.path.join(ROOT, 'data_batch_%d' % (b,)) 
---> 22  X, Y = load_CIFAR_batch(f) 
    23  xs.append(X) 
    24  ys.append(Y) 

C:\Users\lenovo\assignment1\cs231n\data_utils.py in load_CIFAR_batch(filename) 
     7 """ load single batch of cifar """ 
     8 with open(filename, 'rb') as f: 
----> 9  datadict = pickle.load(f) 
    10  X = datadict['data'] 
    11  Y = datadict['labels'] 

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128) 

如何解決此錯誤?我正在使用Annaconda3來運行此代碼。看起來上面的代碼已經在Annaonda2版本中寫過了。任何解決這些錯誤的消息?

只是爲了更多的細節:

我試圖解決從鏈路分配:http://cs231n.github.io/assignments2016/assignment1/

編輯:

添加含load_CIFAR

import _pickle as pickle 
import numpy as np 
import os 
from scipy.misc import imread 

def load_CIFAR_batch(filename): 
    """ load single batch of cifar """ 
    with open(filename, 'rb') as f: 
    datadict = pickle.load(f) 
    X = datadict['data'] 
    Y = datadict['labels'] 
    X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float") 
    Y = np.array(Y) 
    return X, Y 

def load_CIFAR10(ROOT): 
    """ load all of cifar """ 
    xs = [] 
    ys = [] 
    for b in range(1,6): 
    f = os.path.join(ROOT, 'data_batch_%d' % (b,)) 
    X, Y = load_CIFAR_batch(f) 
    xs.append(X) 
    ys.append(Y)  
    Xtr = np.concatenate(xs) 
    Ytr = np.concatenate(ys) 
    del X, Y 
    Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch')) 
    return Xtr, Ytr, Xte, Yte 

回答

2
的定義data_utils.py

你正在加載的泡菜文件最有可能是用python 2生成的。

由於pickle在Python2和Python3中的工作方式存在根本差異,因此您可以嘗試使用latin-1編碼加載文件,並假設將0-255直接映射到字符。

此方法需要進行一些健全性檢查,因爲不能保證生成一致的數據。

+0

二進制模式不接受編碼參數 –

+0

我的壞,我的意思是將它添加到酸菜加載,請參閱我的編輯。 –

+0

現在錯誤已經改變爲「UnicodeDecodeError:'utf-8'編解碼器無法解碼位置6中的字節0x8b:無效啓動」而不是ascii編解碼器 –

相關問題