2016-10-14 113 views

回答

0

movies.dat是在自述中描述::只是一個txt文件DELIM。所以,你可以做到以下幾點:

txt <- readLines("./ml-1m/movies.dat", encoding = "latin1") 
txt_split <- lapply(strsplit(txt, "::"), function(x) as.data.frame(t(x), stringsAsFactors=FALSE)) 
dat <- do.call(rbind, txt_split) 
names(dat) <- c("MovieID", "Title", "Genres") 

dim(dat) # 3883 3 

它給你:

dim(dat) # 3883 3 
head(dat) 

    MovieID        Title      Genres 
1  1     Toy Story (1995) Animation|Children's|Comedy 
2  2      Jumanji (1995) Adventure|Children's|Fantasy 
3  3   Grumpier Old Men (1995)    Comedy|Romance 
4  4   Waiting to Exhale (1995)     Comedy|Drama 
5  5 Father of the Bride Part II (1995)      Comedy 
6  6      Heat (1995)  Action|Crime|Thriller 

PS:我不能弄清楚如何使用「標準」 R讀取方法,如read.tablereadr::read_delimdata.table::fread因爲他們需要1個字節的分離。在讀取數據之前,您可以嘗試將::切換爲其他內容。 sed CLI