我在做Jeffrey Breen的R-Hadoop教程(2012年10月)。 目前我嘗試填充hdfs,然後運行Jeffrey在RStudio的教程中發佈的命令。不幸的是我得到了一些麻煩與它:在cloudera演示cdh3u4(航空公司數據示例)上運行mapreduce作業
更新:我現在移動數據文件夾: /home/cloudera/data/hadoop/wordcount
(和同爲航空公司-DATA) 沒有當我運行populate.hdfs.sh我得到以下的輸出:
[[email protected] ~]$ /home/cloudera/TutorialBreen/bin/populate.hdfs.sh
mkdir: cannot create directory /user/cloudera: File exists
mkdir: cannot create directory /user/cloudera/wordcount: File exists
mkdir: cannot create directory /user/cloudera/wordcount/data: File exists
mkdir: cannot create directory /user/cloudera/airline: File exists
mkdir: cannot create directory /user/cloudera/airline/data: File exists
put: Target /user/cloudera/airline/data/20040325.csv already exists
然後我嘗試了RStudio中的命令,如教程中所示,但我在最後得到錯誤。有人能告訴我我做錯了什麼嗎?
> if (LOCAL)
+ {
+ rmr.options.set(backend = 'local')
+ hdfs.data.root = 'data/local/airline'
+ hdfs.data = file.path(hdfs.data.root, '20040325-jfk-lax.csv')
+ hdfs.out.root = 'out/airline'
+ hdfs.out = file.path(hdfs.out.root, 'out')
+ if (!file.exists(hdfs.out))
+ dir.create(hdfs.out.root, recursive=T)
+ } else {
+ rmr.options.set(backend = 'hadoop')
+ hdfs.data.root = 'airline'
+ hdfs.data = file.path(hdfs.data.root, 'data')
+ hdfs.out.root = hdfs.data.root
+ hdfs.out = file.path(hdfs.out.root, 'out')
+ }
> asa.csvtextinputformat = make.input.format(format = function(con, nrecs) {
+ line = readLines(con, nrecs)
+ values = unlist(strsplit(line, "\\,"))
+ if (!is.null(values)) {
+ names(values) = c('Year','Month','DayofMonth','DayOfWeek','DepTime','CRSDepTime',
+ 'ArrTime','CRSArrTime','UniqueCarrier','FlightNum','TailNum',
+ 'ActualElapsedTime','CRSElapsedTime','AirTime','ArrDelay',
+ 'DepDelay','Origin','Dest','Distance','TaxiIn','TaxiOut',
+ 'Cancelled','CancellationCode','Diverted','CarrierDelay',
+ 'WeatherDelay','NASDelay','SecurityDelay','LateAircraftDelay')
+ return(keyval(NULL, values))
+ }
+ }, mode='text')
> mapper.year.market.enroute_time = function(key, val) {
+ if (!identical(as.character(val['Year']), 'Year')
+ & identical(as.numeric(val['Cancelled']), 0)
+ & identical(as.numeric(val['Diverted']), 0)) {
+ if (val['Origin'] < val['Dest'])
+ market = paste(val['Origin'], val['Dest'], sep='-')
+ else
+ market = paste(val['Dest'], val['Origin'], sep='-')
+ output.key = c(val['Year'], market)
+ output.val = c(val['CRSElapsedTime'], val['ActualElapsedTime'], val['AirTime'])
+ return(keyval(output.key, output.val))
+ }
+ }
> reducer.year.market.enroute_time = function(key, val.list) {
+ if (require(plyr))
+ val.df = ldply(val.list, as.numeric)
+ else { # this is as close as my deficient *apply skills can come w/o plyr
+ val.list = lapply(val.list, as.numeric)
+ val.df = data.frame(do.call(rbind, val.list))
+ }
+ colnames(val.df) = c('crs', 'actual','air')
+ output.key = key
+ output.val = c(nrow(val.df), mean(val.df$crs, na.rm=T),
+ mean(val.df$actual, na.rm=T),
+ mean(val.df$air, na.rm=T))
+ return(keyval(output.key, output.val))
+ }
> mr.year.market.enroute_time = function (input, output) {
+ mapreduce(input = input,
+ output = output,
+ input.format = asa.csvtextinputformat,
+ output.format='csv', # note to self: 'csv' for data, 'text' for bug
+ map = mapper.year.market.enroute_time,
+ reduce = reducer.year.market.enroute_time,
+ backend.parameters = list(
+ hadoop = list(D = "mapred.reduce.tasks=2")
+ ),
+ verbose=T)
+ }
> out = mr.year.market.enroute_time(hdfs.data, hdfs.out)
Error in file(f, if (format$mode == "text") "r" else "rb") :
cannot open the connection
In addition: Warning message:
In file(f, if (format$mode == "text") "r" else "rb") :
cannot open file 'data/local/airline/20040325-jfk-lax.csv': No such file or directory
> if (LOCAL)
+ {
+ results.df = as.data.frame(from.dfs(out, structured=T))
+ colnames(results.df) = c('year', 'market', 'flights', 'scheduled', 'actual', 'in.air')
+ print(head(results.df))
+ }
Error in to.dfs.path(input) : object 'out' not found
非常感謝!
後完成命令輸出plz – octo
也請通過Web界面或hadoop dfs -ls檢查哪些數據當前存在於羣集中。 – rretzbach
@rretzbach這個文件目前出現在:drwxr-xr-x - cloudera supergroup 0 2012-10-28 05:22/user/cloudera/airline drwxr-xr-x - cloudera supergroup 0 2012-10-27 12:33/user/cloudera/asa-airline drwxr-xr-x - cloudera supergroup 0 2012-11-03 04:23/user/cloudera/wordcount – SWR