2015-05-11 53 views
0

我試圖從Cloudera公司採取一個CSV文件,並將其轉換爲XDF錯誤,同時使用轉數r使用rxImport功能

我曾嘗試以下csv文件轉換爲XDF:

InputFile <- file.path("/user/...") 

#create column classes for the data set 
columnClasses <- c("character", "character", "character", 
        "character", "character", "character", 
        "character", "character", "character", 
        "character", "character", "numeric", "character" 
        ) 

names(columnClasses) <- paste("V", seq(1:13), sep = "") 

##convert input csv file to rxTextData object 
textData <- RxTextData(file = InputFile, 
         fileSystem = hdfsFS, 
         colClasses = columnClasses 
         ) 

##set chunk size 
chunk.size <- 250000 

##create output file location 
newXdf <- RxXdfData("/user/...", fileSystem = hdfsFS) 

rxImport(inData = InputFile, 
    outData = newXdf, 
    rowsPerRead = chunk.size, 
    overwrite = TRUE, 
    numRows = -1) 

當我運行此我得到以下錯誤:

Error in rxuHandleClusterJobTryFailure(retObject, hpcServerJob, autoCleanup, : 
    Error completing job on cluster: 
Error in rxExecInDataHadoop(callInfo, matchCall) : 
    Data source does not have an hdfs file system type. 

注:我已經看過textData使用功能,如rxGetInfo和它看起來罰款。

任何見解爲什麼我得到這個錯誤?

回答

0

更新:在rxImport函數中輸入錯誤。

居然跑:

rxImport(inData = textData, 
    outData = newXdf, 
    rowsPerRead = chunk.size, 
    overwrite = TRUE, 
    numRows = -1) 

,並得到了以下錯誤:

Error in rxuHandleClusterJobTryFailure(retObject, hpcServerJob, autoCleanup, : 
    Error completing job on cluster: 
Error in rxCall("Rx_ImportDataSource", params) :