我正在運行gobblin,使用3節點EMR集羣將數據從kafka移動到s3。我在hadoop 2.6.0上運行,並且我還針對2.6.0構建了gobblin。Gobblin Map-reduce作業在EMR上成功運行,但s3中沒有輸出
看起來好像map-reduce作業成功運行。在我的hdfs我看到指標和工作目錄。指標有一些文件,但工作目錄爲空。 S3存儲桶應該有最終的輸出,但沒有數據。並在最後它說
輸出任務狀態路徑/ gooblinOutput /工作/ GobblinKafkaQuickStart_mapR3 /輸出/ job_GobblinKafkaQuickStart_mapR3_1460132596498不存在 刪除工作目錄/ gooblinOutput /工作/ GobblinKafkaQuickStart_mapR3
這裏是最後的日誌:
2016-04-08 16:23:26 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1366 - Job job_1460065322409_0002 running in uber mode : false
2016-04-08 16:23:26 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 0% reduce 0%
2016-04-08 16:23:32 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 10% reduce 0%
2016-04-08 16:23:33 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 40% reduce 0%
2016-04-08 16:23:34 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 60% reduce 0%
2016-04-08 16:23:36 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 80% reduce 0%
2016-04-08 16:23:37 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1373 - map 100% reduce 0%
2016-04-08 16:23:38 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1384 - Job job_1460065322409_0002 completed successfully
2016-04-08 16:23:38 UTC INFO [main] org.apache.hadoop.mapreduce.Job 1391 - Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=1276095
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=28184
HDFS: Number of bytes written=41960
HDFS: Number of read operations=60
HDFS: Number of large read operations=0
HDFS: Number of write operations=11
Job Counters
Launched map tasks=10
Other local map tasks=10
Total time spent by all maps in occupied slots (ms)=1828125
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=40625
Total vcore-seconds taken by all map tasks=40625
Total megabyte-seconds taken by all map tasks=58500000
Map-Reduce Framework
Map input records=10
Map output records=0
Input split bytes=2150
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=296
CPU time spent (ms)=10900
Physical memory (bytes) snapshot=2715054080
Virtual memory (bytes) snapshot=18852671488
Total committed heap usage (bytes)=4729077760
File Input Format Counters
Bytes Read=6444
File Output Format Counters
Bytes Written=0
2016-04-08 16:23:38 UTC INFO [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 101 - Stopping the TaskStateCollectorService
2016-04-08 16:23:38 UTC WARN [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 123 - Output task state path /gooblinOutput/working/GobblinKafkaQuickStart_mapR3/output/job_GobblinKafkaQuickStart_mapR3_1460132596498 does not exist
2016-04-08 16:23:38 UTC INFO [main] gobblin.runtime.mapreduce.MRJobLauncher 443 - Deleted working directory /gooblinOutput/working/GobblinKafkaQuickStart_mapR3
2016-04-08 16:23:38 UTC INFO [main] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: [email protected][Shutting down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
2016-04-08 16:23:38 UTC INFO [main] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: [email protected][Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
2016-04-08 16:23:38 UTC INFO [main] gobblin.runtime.app.ServiceBasedAppLauncher 158 - Shutting down the application
2016-04-08 16:23:38 UTC INFO [MetricsReportingService STOPPING] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: j[email protected]5584dbb6
2016-04-08 16:23:38 UTC INFO [MetricsReportingService STOPPING] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: j[email protected]5584dbb6
2016-04-08 16:23:38 UTC WARN [Thread-7] gobblin.runtime.app.ServiceBasedAppLauncher 153 - ApplicationLauncher has already stopped
2016-04-08 16:23:38 UTC WARN [Thread-4] gobblin.metrics.reporter.ContextAwareReporter 116 - Reporter MetricReportReporter has already been stopped.
2016-04-08 16:23:38 UTC WARN [Thread-4] gobblin.metrics.reporter.ContextAwareReporter 116 - Reporter MetricReportReporter has already been stopped.
這裏是我的conf文件:
gobblin-mapreduce.properties
# Thread pool settings for the task executor
taskexecutor.threadpool.size=2
taskretry.threadpool.coresize=1
taskretry.threadpool.maxsize=2
# File system URIs
fs.uri=hdfs://{host}:8020
writer.fs.uri=${fs.uri}
state.store.fs.uri=s3a://{bucket}/gobblin-mapr/
# Writer related configuration properties
writer.destination.type=HDFS
writer.output.format=AVRO
writer.staging.dir=${env:GOBBLIN_WORK_DIR}/task-staging
writer.output.dir=${env:GOBBLIN_WORK_DIR}/task-output
# Data publisher related configuration properties
data.publisher.type=gobblin.publisher.BaseDataPublisher
data.publisher.final.dir=${env:GOBBLIN_WORK_DIR}/job-output
data.publisher.replace.final.dir=false
# Directory where job/task state files are stored
state.store.dir=${env:GOBBLIN_WORK_DIR}/state-store
# Directory where error files from the quality checkers are stored
qualitychecker.row.err.file=${env:GOBBLIN_WORK_DIR}/err
# Directory where job locks are stored
job.lock.dir=${env:GOBBLIN_WORK_DIR}/locks
# Directory where metrics log files are stored
metrics.log.dir=${env:GOBBLIN_WORK_DIR}/metrics
# Interval of task state reporting in milliseconds
task.status.reportintervalinms=5000
# MapReduce properties
mr.job.root.dir=${env:GOBBLIN_WORK_DIR}/working
# s3 bucket configuration
data.publisher.fs.uri=s3a://{bucket}/gobblin-mapr/
fs.s3a.access.key={key}
fs.s3a.secret.key={key}
˚F ILE 2:卡夫卡對s3.pull
job.name=GobblinKafkaQuickStart_mapR3
job.group=GobblinKafka_mapR3
job.description=Gobblin quick start job for Kafka
job.lock.enabled=false
kafka.brokers={kafka-host}:9092
topic.whitelist={topic_name}
source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=gobblin.extract.kafka
writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.file.path.type=tablename
writer.destination.type=HDFS
writer.output.format=txt
data.publisher.type=gobblin.publisher.BaseDataPublisher
mr.job.max.mappers=10
bootstrap.with.offset=latest
metrics.reporting.file.enabled=true
metircs.enabled=true
metrics.reporting.file.suffix=txt
運行命令
export GOBBLIN_WORK_DIR=/gooblinOutput
Command : bin/gobblin-mapreduce.sh --conf /home/hadoop/gobblin-files/gobblin-dist/kafkaConf/kafka-to-s3.pull --logdir /home/hadoop/gobblin-files/gobblin-dist/logs
不知道怎麼回事。有人可以幫忙嗎?