我想在我們的集羣上運行此作業,並且我不斷收到此類型的對象'closure'不是子集表達式「錯誤。它基本上在一堆節點上運行這個函數「do_1()」。我正在進行子集化的閉包對象被稱爲「數據」,所以我認爲這是因爲RData文件沒有在每個節點上讀取(這可能不是調用每個這些單個數據集「數據」的最佳實踐,所以這是我的錯誤) 。集羣r腳本不正確讀取RData數據集
我將腳本剝離爲儘可能裸露的骨骼,並顯示在下面。提交作業時,它仍會產生相同的錯誤。我認爲有些東西我不知道在每個節點上的單獨數據集中讀取......我在調用load()時可能沒有指定一些參數。也許「數據」數據集不在正確的命名空間或什麼......我不確定。任何想法將受到讚賞。
library(parallel)
library(Rmpi)
np <- mpi.universe.size()
cl <- makeCluster(np, type = "MPI")
allFiles <- list.files("/bigtmp/trb5me/rdata_files/")
allFiles <- sapply(allFiles, function(string) paste("/bigtmp/trb5me/rdata_files/", string, sep = ""))
run_one_day <- function(daynum){
# do we want to subset days to not the first hour?
train <- data[[daynum]] * 10000
train
}
clusterExport(cl = cl, "run_one_day")
do_1 <- function(path_to_file){
if(!require(xts)){
install.packages("xts")
library(xts)
}
# load data
load(file=path_to_file)
# extract the symbol name so we cna save the results later
symbolName <- strsplit(path_to_file, "/")[[1]][5]
symbolName <- strsplit(symbolName, ".", fixed = T)[[1]][1]
# get the results
# there is also a function called data...so in this case it's length will be 1
mySequence <- 1:(length(data)-1)
myResults <- lapply(mySequence, run_one_day) #this is where the problem is!
# save the results
path_dest <- paste("/bigtmp/trb5me/mod1_results/", symbolName, ".RData", sep = "")
save(myResults, file = path_dest)
# remove everything from memory
rm(list=ls())
}
parLapply(cl, allFiles, do_1)
# turn off all the cluster stuff
stopCluster(cl)
mpi.exit()
錯誤來自哪個函數?嘗試包括選項(錯誤=回溯) –
等等,我明白了;沒關係。 –