你可以找到辦公products here
Matlab是很好的與此大型文件和大型文件集的工作極限。 2014版對csv這個引人入勝的數據存儲有了很多改進,現在對於excel文件也可以很好地工作。
看看這個教程:
http://blogs.mathworks.com/loren/2014/12/03/reading-big-data-into-matlab/
我有一個3 CSV文件(文件[1-3]的.csv)含有這樣的:
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
a3,b3,c3,d3,e3
a4,b4,c4,d4,e4
a5,b5,c5,d5,e5
a6,b6,c6,d6,e6
a7,b7,c7,d7,e7
a8,b8,c8,d8,e8
a9,b9,c9,d9,e9
a10,b10,c10,d10,e10
和文件varnames對於列的名稱:
ABCDE
讓我們來讀取文件:
>> datafile = 'csv-files/file1.csv';
>> headerfile = 'csv-files/varnames.txt'
>> fileID = fopen(headerfile);
>> varnames = textscan(fileID,'%s');
>> varnames = varnames{:};
ds = datastore(datafile,'ReadVariableNames',false);
>> ds.VariableNames = varnames
ds =
TabularTextDatastore with properties:
Files: {
'/home/anquegi/learn/matlab/stackoverflow/csv-files/file1.csv'
}
FileEncoding: 'UTF-8'
ReadVariableNames: false
VariableNames: {'A', 'B', 'C' ... and 2 more}
Text Format Properties:
NumHeaderLines: 0
Delimiter: ','
RowDelimiter: '\r\n'
TreatAsMissing: ''
MissingValue: NaN
Advanced Text Format Properties:
TextscanFormats: {'%q', '%q', '%q' ... and 2 more}
ExponentCharacters: 'eEdD'
CommentStyle: ''
Whitespace: ' \b\t'
MultipleDelimitersAsOne: false
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'A', 'B', 'C' ... and 2 more}
SelectedFormats: {'%q', '%q', '%q' ... and 2 more}
ReadSize: 20000 rows
>> preview(ds)
ans =
A B C D E
____ ____ ____ ____ ____
'a1' 'b1' 'c1' 'd1' 'e1'
'a2' 'b2' 'c2' 'd2' 'e2'
'a3' 'b3' 'c3' 'd3' 'e3'
'a4' 'b4' 'c4' 'd4' 'e4'
'a5' 'b5' 'c5' 'd5' 'e5'
'a6' 'b6' 'c6' 'd6' 'e6'
'a7' 'b7' 'c7' 'd7' 'e7'
'a8' 'b8' 'c8' 'd8' 'e8'
如果我們看一下我們採取的參數READSIZE是READSIZE:20000行,因此matlab的每次讀取20000行,你可以處理。由於數據只有10行,我將其更改爲三:
>> ds.ReadSize=3
ds =
TabularTextDatastore with properties:
Files: {
'/home/anquegi/learn/matlab/stackoverflow/csv-files/file1.csv'
}
FileEncoding: 'UTF-8'
ReadVariableNames: false
VariableNames: {'A', 'B', 'C' ... and 2 more}
Text Format Properties:
NumHeaderLines: 0
Delimiter: ','
RowDelimiter: '\r\n'
TreatAsMissing: ''
MissingValue: NaN
Advanced Text Format Properties:
TextscanFormats: {'%q', '%q', '%q' ... and 2 more}
ExponentCharacters: 'eEdD'
CommentStyle: ''
Whitespace: ' \b\t'
MultipleDelimitersAsOne: false
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'A', 'B', 'C' ... and 2 more}
SelectedFormats: {'%q', '%q', '%q' ... and 2 more}
ReadSize: 3 rows
>> reset(ds)
while hasdata(ds)
T = read(ds);
T.A
end
ans =
'a1'
'a2'
'a3'
ans =
'a4'
'a5'
'a6'
ans =
'a7'
'a8'
'a9'
ans =
'a10'
那麼變量t是,你可以把它寫在那裏,你想有一個表:請注意,每次讀(DS)是移動的號碼通過readsie分配線,這個參數可以是行,或者這也許是題外話文件
>> reset(ds)
>> T = read(ds);
>> T
T =
A B C D E
____ ____ ____ ____ ____
'a1' 'b1' 'c1' 'd1' 'e1'
'a2' 'b2' 'c2' 'd2' 'e2'
'a3' 'b3' 'c3' 'd3' 'e3'
>> writetable(T,'mySpreadsheet','FileType','spreadsheet')
>> reset(ds)
您可以使用帶有Excel的ADO來引用文本文件。 – Fionnuala
無論是MATLAB還是python,這都是微不足道的。打開文件,逐行閱讀直到開心,保存數據。 –