2017-01-11 56 views

我目前正在引進大型數據集分爲R和我發現FREAD()從data.tables能夠使其在一個合理的時間(read.csv對我來說真的很慢)。[R data.table FREAD()在整個文本文件不完全帶來






library(data.table) #1.10.0 

DT <- fread("1100110011001100 
#Warning message: 
#In fread("1100110011001100\n  1100110011001100") : 
# Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again. 

#    V1 
#1: 5.435266e-309 
#2: 5.435266e-309 
#Warning message: 
#In print.data.table(DT) : 
# Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again. 

#     V1 
#1: 1100110011001100 
#2: 1100110011001100 



DT <- fread("1100110011001100 
     1100110011001100", colClasses = "character") 
#     V1 
#1: 1100110011001100 
#2: 1100110011001100 


DT <- fread("1100110011001100 
     1100110011001100", colClasses = "character", verbose = TRUE) 
Input contains a \n (or is ""). Taking this to be text input (not a filename) 
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. 
Positioned on line 1 after skip or autostart 
This line is the autostart and not blank so searching up for the last non-blank ... line 1 
Detecting sep ... Deducing this is a single column input. 
Starting data input on line 1 (either column names or first row of data). First 10 characters: 1100110011 
Some fields on line 1 are not type character (or are empty). Treating as a data row and using default column names. 
Count of eol: 2 (including 0 at the end) 
ncol==1 so sep count ignored 
Type codes (point 0): 2 
Column 1 ('V1') was detected as type 'integer64' but bumped to 'character' as requested by colClasses 
Type codes: 4 (after applying colClasses and integer64) 
Type codes: 4 (after applying drop or select (if supplied) 
Allocating 1 column slots (1 - 0 dropped) 
Read 2 rows. Exactly what was estimated and allocated up front 
    0.000s ( 0%) Memory map (rerun may be quicker) 
    0.000s ( 0%) sep and header detection 
    0.000s ( 0%) Count rows (wc -l) 
    0.000s ( 0%) Column type detection (100 rows at 10 points) 
    0.000s ( 0%) Allocation of 2x1 result (xMB) in RAM 
    0.000s ( 0%) Reading data 
    0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered 
    0.000s ( 0%) Coercing data already read in type bumps (if any) 
    0.000s ( 0%) Changing na.strings to NA 
    0.001s  Total 
