我試圖運行中的鏈接如何在MATLAB中加載MNIST數字和標籤數據?
https://github.com/bd622/DiscretHashing
離散哈希給出的代碼是降維的方法是在近似最近鄰搜索中使用。我想加載在http://yann.lecun.com/exdb/mnist/中可用的MNIST數據庫上的實現。我已經從壓縮的gz格式中提取文件。
問題1:
利用該解決方案來讀取MNIST數據庫Reading MNIST Image Database binary file in MATLAB
提供我收到以下錯誤:
Error using fread
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in Reading (line 7)
A = fread(fid, 1, 'uint32');
下面是代碼:
clear all;
close all;
%//Open file
fid = fopen('t10k-images-idx3-ubyte', 'r');
A = fread(fid, 1, 'uint32');
magicNumber = swapbytes(uint32(A));
%//For each image, store into an individual cell
imageCellArray = cell(1, totalImages);
for k = 1 : totalImages
%//Read in numRows*numCols pixels at a time
A = fread(fid, numRows*numCols, 'uint8');
%//Reshape so that it becomes a matrix
%//We are actually reading this in column major format
%//so we need to transpose this at the end
imageCellArray{k} = reshape(uint8(A), numCols, numRows)';
end
%//Close the file
fclose(fid);
UPDATE:問題1解決,並且修改後的代碼是
clear all;
close all;
%//Open file
fid = fopen('t10k-images.idx3-ubyte', 'r');
A = fread(fid, 1, 'uint32');
magicNumber = swapbytes(uint32(A));
%//Read in total number of images
%//A = fread(fid, 4, 'uint8');
%//totalImages = sum(bitshift(A', [24 16 8 0]));
%//OR
A = fread(fid, 1, 'uint32');
totalImages = swapbytes(uint32(A));
%//Read in number of rows
%//A = fread(fid, 4, 'uint8');
%//numRows = sum(bitshift(A', [24 16 8 0]));
%//OR
A = fread(fid, 1, 'uint32');
numRows = swapbytes(uint32(A));
%//Read in number of columns
%//A = fread(fid, 4, 'uint8');
%//numCols = sum(bitshift(A', [24 16 8 0]));
%// OR
A = fread(fid, 1, 'uint32');
numCols = swapbytes(uint32(A));
for k = 1 : totalImages
%//Read in numRows*numCols pixels at a time
A = fread(fid, numRows*numCols, 'uint8');
%//Reshape so that it becomes a matrix
%//We are actually reading this in column major format
%//so we need to transpose this at the end
imageCellArray{k} = reshape(uint8(A), numCols, numRows)';
end
%//Close the file
fclose(fid);
問題2:
我無法理解如何將4個文件的MNIST應用中的代碼。代碼包含變量
traindata = double(traindata);
testdata = double(testdata);
如何準備MNIST數據庫以便我可以應用於實施?
UPDATE:我實現瞭解決方案,但我不斷收到此錯誤
Error using fread
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in mnist_parse (line 11)
A = fread(fid1, 1, 'uint32');
這些文件
demo.m
%,這是調用該函數在MNIST數據讀取主文件
clear all
clc
[Trainimages, Trainlabels] = mnist_parse('C:\Users\Desktop\MNIST\train-images-idx3-ubyte', 'C:\Users\Desktop\MNIST\train-labels-idx1-ubyte');
[Testimages, Testlabels] = mnist_parse('t10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte');
k=5;
digit = images(:,:,k);
lbl = label(k);
function [images, labels] = mnist_parse(path_to_digits, path_to_labels)
% Open files
fid1 = fopen(path_to_digits, 'r');
% The labels file
fid2 = fopen(path_to_labels, 'r');
% Read in magic numbers for both files
A = fread(fid1, 1, 'uint32');
magicNumber1 = swapbytes(uint32(A)); % Should be 2051
fprintf('Magic Number - Images: %d\n', magicNumber1);
A = fread(fid2, 1, 'uint32');
magicNumber2 = swapbytes(uint32(A)); % Should be 2049
fprintf('Magic Number - Labels: %d\n', magicNumber2);
% Read in total number of images
% Ensure that this number matches with the labels file
A = fread(fid1, 1, 'uint32');
totalImages = swapbytes(uint32(A));
A = fread(fid2, 1, 'uint32');
if totalImages ~= swapbytes(uint32(A))
error('Total number of images read from images and labels files are not the same');
end
fprintf('Total number of images: %d\n', totalImages);
% Read in number of rows
A = fread(fid1, 1, 'uint32');
numRows = swapbytes(uint32(A));
% Read in number of columns
A = fread(fid1, 1, 'uint32');
numCols = swapbytes(uint32(A));
fprintf('Dimensions of each digit: %d x %d\n', numRows, numCols);
% For each image, store into an individual slice
images = zeros(numRows, numCols, totalImages, 'uint8');
for k = 1 : totalImages
% Read in numRows*numCols pixels at a time
A = fread(fid1, numRows*numCols, 'uint8');
% Reshape so that it becomes a matrix
% We are actually reading this in column major format
% so we need to transpose this at the end
images(:,:,k) = reshape(uint8(A), numCols, numRows).';
end
% Read in the labels
labels = fread(fid2, totalImages, 'uint8');
% Close the files
fclose(fid1);
fclose(fid2);
end
錯誤很明顯,您對無效的文件名使用了'fopen'。確保't10k-images-idx3-ubyte'是文件的* full *名稱,它位於你當前的MATLAB路徑中。否則,請確保它是要打開的文件的* full *絕對路徑。 – excaza
@excaza:解決了第一個問題和文件讀取操作引起的錯誤。文件名確實存在問題。但現在我不知道如何使用數據庫,我無法理解如何使用這4個文件。我相信traindata變量將包含文件train-images.idx3-ubyte。然後,哪一個是testdata,然後我應該如何使用2個標籤數據庫文件?請幫忙 – SKM
@rayryeng:你能告訴我爲什麼當我實現你的答案時,由於文件讀取操作而出現錯誤嗎?我已經在問題中提出了新的更新。感謝您的時間和精力。 – SKM