2017-02-10 21 views
0

我使用mlr來完成文本分類任務。在mlr中使用帶自定義過濾器的parallelMap包

Exporting objects to slaves for mode socket: .mlr.slave.options 
Mapping in parallel: mode = socket; cpus = 4; elements = 2. 
Error in stopWithJobErrorMessages(inds, vcapply(result.list[inds], as.character)) : 
    Errors occurred in 2 slave jobs, displaying at most 10 of them: 

00001: Error in parallel:::.slaveRSOCK() : 
    Assertion on 'method' failed: Must be element of set {'anova.test','carscore','cforest.importance','chi.squared','gain.ratio','information.gain','kruskal.test','linear.correlation','mrmr','oneR','permutation.importance','randomForest.importance','randomForestSRC.rfsrc','randomForestSRC.var.select','rank.correlation','relief','rf.importance','rf.min.depth','symmetrical.uncertainty','univariate','univariate.model.score','variance'}. 

我是從假設:我這裏描述

Create Custom Filters

過濾器按預期工作,但是當我嘗試和ustilise parallelization我收到以下錯誤編寫自定義過濾器我的自定義過濾器需要是集合中的一個元素才能並行工作,但是如果(a)這是可能的,並且(b)如果是,我怎麼辦去做吧。

在此先感謝您的幫助, 阿扎姆

補充:測試腳本 我不能讓你看到實際的腳本/數據我和由於靈敏度工作,但這個例子中再現我看到的錯誤。除了自定義功能選擇和數據集之外,設置學習者並評估它的步驟與我在'真實'腳本中的一樣。正如在我的真實情況下,如果刪除parallelStartSocket()命令,則腳本按預期運行。

我還應該補充一點,在用RBF內核調整SVM的超參數時,我已經成功地使用了(或者至少我沒有收到任何錯誤)並行處理:腳本與makeParamSet()定義相同。

library(parallelMap) 
library(mlr) 
library(kernlab) 

makeFilter(
    name = "nonsense.filter", 
    desc = "Calculates scores according to alphabetical order of features", 
    pkg = "mlr", 
    supported.tasks = c("classif", "regr", "surv"), 
    supported.features = c("numerics", "factors", "ordered"), 
    fun = function(task, nselect, decreasing = TRUE, ...) { 
    feats = getTaskFeatureNames(task) 
    imp = order(feats, decreasing = decreasing) 
    names(imp) = feats 
    imp 
    } 
) 

# set up svm with rbf kernal 
svm.lrn <- makeLearner("classif.ksvm",predict.type = "response") 

# wrap learner with filter 
svm.lrn <- makeFilterWrapper(svm.lrn, fw.method = "nonsense.filter") 

# define feature selection parameters 

ps.svm = makeParamSet(
    makeDiscreteParam("fw.abs", values = seq(2, 3, 1)) 

) 

# define inner search and evaluation strategy 
ctrl.svm = makeTuneControlGrid() 
inner.svm = makeResampleDesc("CV", iters = 5, stratify = TRUE) 

svm.lrn <- makeTuneWrapper(svm.lrn, resampling = inner.svm, par.set = ps.svm, 
          control = ctrl.svm) 

# set up outer resampling 
outer.svm <- makeResampleDesc("CV", iters = 10, stratify = TRUE) 

# run it... 

parallelStartSocket(2) 

run.svm <- resample(svm.lrn, iris.task, 
        resampling = outer.svm, extract = getTuneResult) 

parallelStop() 
+0

你能不能提供一個完整的例子,它允許重現該問題嗎? –

+0

@LarsKotthoff,示例腳本添加到原始文章中。謝謝,Azam –

回答

1

問題是makeFilter註冊了S3方法,這些方法在單獨的R進程中不可用。有兩個選項可以使其工作:或者簡單地使用parallelStartMulticore(2),以便所有內容都運行在相同的R進程中,或告知parallelMap有關其他R進程中需要出現的片段。

後者有兩部分。首先,使用parallelLibrary("mlr")隨處加載mlr,並將過濾器的定義提取到可使用parallelSource()加載的單獨文件中。例如:

filter.R:

makeFilter(
    name = "nonsense.filter", 
    desc = "Calculates scores according to alphabetical order of features", 
    pkg = "mlr", 
    supported.tasks = c("classif", "regr", "surv"), 
    supported.features = c("numerics", "factors", "ordered"), 
    fun = function(task, nselect, decreasing = TRUE, ...) { 
    feats = getTaskFeatureNames(task) 
    imp = order(feats, decreasing = decreasing) 
    names(imp) = feats 
    imp 
    } 
) 

main.R:

library(parallelMap) 
library(mlr) 
library(kernlab) 

parallelStartSocket(2) 

parallelLibrary("mlr") 
parallelSource("filter.R") 

# set up svm with rbf kernal 
svm.lrn = makeLearner("classif.ksvm",predict.type = "response") 

# wrap learner with filter 
svm.lrn = makeFilterWrapper(svm.lrn, fw.method = "nonsense.filter") 

# define feature selection parameters 

ps.svm = makeParamSet(
    makeDiscreteParam("fw.abs", values = seq(2, 3, 1)) 

) 

# define inner search and evaluation strategy 
ctrl.svm = makeTuneControlGrid() 
inner.svm = makeResampleDesc("CV", iters = 5, stratify = TRUE) 

svm.lrn = makeTuneWrapper(svm.lrn, resampling = inner.svm, par.set = ps.svm, 
          control = ctrl.svm) 

# set up outer resampling 
outer.svm = makeResampleDesc("CV", iters = 10, stratify = TRUE) 

# run it... 
run.svm = resample(svm.lrn, iris.task, resampling = outer.svm, extract = getTuneResult) 

parallelStop() 
+0

非常感謝。你描述的第二種方法適用於我 - 我在Windows下運行它,我相信它不支持parallelStartMulticore()變體。最好的祝願, –