我試圖從https://jeffreybreen.wordpress.com/2011/01/10/segue-r-to-amazon-elastic-mapreduce-hadoop/Emrlapply並不簡單的任務,工作
集羣創建複製使用segue
的簡單的例子是成功的
> cl <- createCluster(numInstances=2)
STARTING - 2012-05-27 14:02:08
STARTING - 2012-05-27 14:02:39
STARTING - 2012-05-27 14:03:10
STARTING - 2012-05-27 14:03:42
STARTING - 2012-05-27 14:04:13
STARTING - 2012-05-27 14:04:44
STARTING - 2012-05-27 14:05:15
STARTING - 2012-05-27 14:05:46
STARTING - 2012-05-27 14:06:17
BOOTSTRAPPING - 2012-05-27 14:06:48
BOOTSTRAPPING - 2012-05-27 14:07:19
BOOTSTRAPPING - 2012-05-27 14:07:50
BOOTSTRAPPING - 2012-05-27 14:08:21
BOOTSTRAPPING - 2012-05-27 14:08:52
BOOTSTRAPPING - 2012-05-27 14:09:23
BOOTSTRAPPING - 2012-05-27 14:09:55
WAITING - 2012-05-27 14:10:26
Your Amazon EMR Hadoop Cluster is ready for action.
Remember to terminate your cluster with stopCluster().
Amazon is billing you!
本地模擬還行,但在運行它每次都會返回一個錯誤。
> myList <- NULL
> set.seed(1)
> for (i in 1:10){
+ a <- c(rnorm(999), NA)
+ myList[[i]] <- a
+ }
> outputLocal <- lapply(myList, mean, na.rm=T)
> outputEmr <- emrlapply(cl, myList, mean, na.rm=T)
RUNNING - 2012-05-27 14:11:58
RUNNING - 2012-05-27 14:12:29
RUNNING - 2012-05-27 14:13:00
WAITING - 2012-05-27 14:13:31
Error in lines[[i]] : subgroup is out of range
> stopCluster(cl)
我喜歡這個包的想法,我希望這將是我的工作是有用的,但我無法弄清楚如何解決這個基本問題。
segue
版0.02
OS的:Ubuntu的11.10
UPDATE:我試圖運行裨估計的另一示例的測試用例,並且emrlapply
返回相同的錯誤消息。
UPDATE2: 我更新到版本0.03,現在我無法連接到羣集。成功啓動實例後,嘗試關閉而沒有任何效果。我通過AWS consol終止了實例。所以舊問題解決了,但新問題出現了。
> cl <- createCluster(numInstances=2)
STARTING - 2012-06-01 22:36:10
STARTING - 2012-06-01 22:36:41
STARTING - 2012-06-01 22:37:12
STARTING - 2012-06-01 22:37:43
STARTING - 2012-06-01 22:38:14
STARTING - 2012-06-01 22:38:46
SHUTTING_DOWN - 2012-06-01 22:39:17
SHUTTING_DOWN - 2012-06-01 22:39:48
...
SHUTTING_DOWN - 2012-06-01 22:48:05
SHUTTING_DOWN - 2012-06-01 22:48:36
FAILED - 2012-06-01 22:49:07
>
尋呼@jdlong現場 – Chase
好「好」消息是,我可以重現你的錯誤。我現在正在調試,看看我能否弄清楚發生了什麼。 –
@JDLong,謝謝你的迴應!奇怪的是,別人之前沒有發現這個問題。 – DrDom