如何使用textscan讀取文件？

我有一個大的製表符分隔的文件（10000行，15000列），並想將其導入到Matlab。如何使用textscan讀取文件？

我已經嘗試使用textscan功能通過以下方式將其導入：

function [C_text, C_data] = ReadDataFile(filename, header, attributesCount, delimiter, 

attributeFormats, attributeFormatCount) 
AttributeTypes = SetAttributeTypeMatrix(attributeFormats, attributeFormatCount); 
fid = fopen(filename); 
if(header == 1) 
    %read column headers 
    C_text = textscan(fid, '%s', attributesCount, 'delimiter', delimiter); 
    C_data = textscan(fid, AttributeTypes{1, 1}, 'headerlines', 1); 
else 
    C_text = ''; 
    C_data = textscan(fid, AttributeTypes{1, 1}); 
end 


fclose(fid);

AttributeTypes {1,1}是字符串至極描述變量類型對於每一列（在這種情況下，有14740浮子和260個字符串類型變量，因此AttributeTypes {1,1}的值爲'％f％f ......％f％s％s ...％s，其中％f重複14740次，％s爲260次）。

當我嘗試執行

>> [header, data] = ReadDataFile('data/orange_large_train.data.chunk1', 1, 15000, '\t', types, size);

頭陣列似乎是正確的（列名已經被正確讀取）。

數據是一個1 x 15000數組（僅導入第一行而不是10000）並且不知道是什麼導致了這種行爲。

我想這個問題是在這一行造成的：

C_data = textscan(fid, AttributeTypes{1, 1});

，但不知道什麼可能是錯誤的，因爲在幫助說明一個類似的例子。

如果你們中的任何人提出瞭解決問題的方法，我將非常感激 - 如何讀取所有10000行。

來源

2010-08-05 Niko Gamulin

我相信你所有的數據都在那裏。如果你看看data，那裏的每個單元格應該包含整個列（10000x1）。您可以將第i個單元格作爲一個數組提取，其格式爲data{i}。

你可能想分開雙數據和字符串數據。我不知道什麼是attributeFormats，你可能可以使用這個數組。但是你也可以使用AttributeTypes{1, 1}。

isdouble = strfind(AttributeTypes{1, 1}(2:2:end),'f'); 
data_double = cell2mat(data(isdouble));

字符串數據合併到字符串中的一個單元陣列，你可以這樣做：

isstring = strfind(AttributeTypes{1, 1}(2:2:end),'s'); 
data_string = horzcat(data{isstring});

來源

2010-08-05 15:13:44 yuk

如何使用textscan讀取文件？

回答

相關問題