2015-05-08 29 views
0

我是R編程中的新手,不幸的是我必須處理movieLens-1M數據。在這裏我想問我如何可以在movies.dat的分隔符[::]處拆分列。我也試試這個代碼:在MovieLens-1M中的分隔符[::]中拆分列R

> moviesDF<-read.delim("movies.dat", sep="|", header=F, stringsAsFactors=FALSE) 
> str(moviesDF) 
'data.frame': 3998 obs. of 3 variables: 
$ V1: chr "1::Toy Story (1995)::Animation" "2::Jumanji (1995)::Adventure" "3::Grumpier Old Men (1995)::Comedy" "4::Waiting to Exhale (1995)::Comedy" ... 
$ V2: chr "Children's" "Children's" "Romance" "Drama" ... 
$ V3: chr "Comedy" "Fantasy" "" "" ... 

下所需的輸出如下:

V1: Movie ID 
V2: Title 
V3: Genre 

另外,我的目標是提供推薦系統

+0

這裏開始'不公開(strsplit( 「1 ::玩具總動員(1995年)::動畫」, 「::」))',也見[stringr包](http://cran.r-project.org/web/packages/stringr/vignettes/stringr.html) – zx8754

回答

1

你可以試着從我的「splitstackshape」包cSplit 。用法是:

library(splitstackshape) 
cSplit(moviesDF, "V1", "::") 
#   V2  V3 V1_1      V1_2  V1_3 
# 1: Children's Comedy 1   Toy Story (1995) Animation 
# 2: Children's Fantasy 2   Jumanji (1995) Adventure 
# 3: Romance   3 Grumpier Old Men (1995) Comedy 
# 4:  Drama   4 Waiting to Exhale (1995) Comedy 
1

問題出在導入函數中。 read.delim(sep="|")未正確讀取數據集,因爲|只能劃分V3中所需的不同值。您應導入您的數據集,而不是readLines

moviesDF <- readLines("movies.dat") 
moviesDF <- as.data.frame(do.call("rbind",strsplit(moviesDF,"::")),stringsAsFactors = FALSE) 
names(moviesDF) <- c("V1","V2","V3")