2015-12-09 27 views
0

我目前面臨下面提及的誤差,其爲NULL值相關被脅迫的數據幀的方法或者默認。該數據集確實包含空值,但是我曾經嘗試都is.na()和is.null()函數用別的東西來代替空值。數據存儲在hdfs中,並以pig.hive格式存儲。我還附上了下面的代碼。如果我從鍵中刪除v [,25],代碼就可以正常工作。誤差(X,類(k))的:沒有對脅迫「NULL」到「data.frame」

代碼:

AM = c("AN"); 
UK = c("PP"); 
sample.map <- function(k,v){ 
key <- data.frame(acc = v[!which(is.na(v[,1],1], 
        year = substr(v[!which(is.na(v[,1]),2],1,4), 
        month = substr(v[!which(is.na(v[,1]),2],5,6)) 
value <- data.frame(v[,3],count=1) 
keyval(key,value) 
} 

sample.reduce <- function(key,v){ 
    AT <- sum(v[which(v[,1] %in% AM=="TRUE"),2]) 
    UnknownT <- sum(v[which(v[,1] %in% UK=="TRUE"),2]) 
    Total <- AT + UnknownT 
    d <- data.frame(AT,UnknownT,Total) 
    keyval(key,d) 
} 
out <- mapreduce(input ="/user/hduser/input", 
      output = "/user/hduser/output", 
      input.format = make.input.format("pig.hive", sep = "\u0001")        
      output.format = make.output.format("csv", sep = ","), 
      map= sample.map) 
      reduce = sample.reduce) 

錯誤:

Warning in asMethod(object) : NAs introduced by coercion 
Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) : data length is not a multiple of split variable 
Warning in rmr.split(x, x, FALSE, keep.rownames = FALSE) : number of items to replace is not a multiple of replacement length Warning in  split.default(1:rmr.length(y), unique(ind), drop = TRUE) : 
data length is not a multiple of split variable 
Warning in rmr.split(v, ind, lossy = lossy, keep.rownames = TRUE) : number of items to replace is not a multiple of replacement length 
Error in as(x, class(k)) :  
no method or default for coercing 「NULL」 to 「data.frame」 
Calls: <Anonymous> ... apply.reduce -> c.keyval -> reduce.keyval -> lapply -> FUN -> as No traceback available 

UPDATE 我已添加的採樣數據和編輯上面的代碼。希望這可以幫助!

樣本數據:

NULL,"2014-03-14","PP" 
345689202,"2014-03-14","AN" 
234539390,"2014-03-14","PP" 
123125444,"2014-03-14","AN" 
NULL,"2014-03-14","AN" 
901828393,"2014-03-14","AN" 
+1

這是不可複製的。請這樣做。 –

+0

Hi Roman,這有幫助嗎?另外我想提到的是,數據存儲在hdfs上,並且此快照是匿名的。但它看起來像這樣。 –

回答

1

有一些issuesas近來已經確定。我不明白爲什麼as不能用缺省處理這個問題,但可以修改coerce其處理轉換與S4方法調用as.data.frame

setMethod("coerce",c("NULL","data.frame"), function(from, to, strict=TRUE) as.data.frame(from)) 
[1] "coerce" 
as(NULL,"data.frame") 
data frame with 0 columns and 0 rows 
+0

我應該在哪裏運行這段代碼?截至目前,我的hadoop環境包含3個安裝了R和Rmr2軟件包的工作節點。我應該在所有這些節點上運行這個嗎?我也應該每次運行腳本時都運行這個方法?對於提出太多問題抱歉。 –

+0

是的,它需要由每個需要使用該方法的工作人員運行。最好把它放到一個.profile文件中,以便在啓動時運行。 – James

+0

這工作!我將其添加到.profile文件並重新啓動了我的R會話。感謝詹姆斯的及時迴應:) –

相關問題