使用Flume 1.5.0來從應用程序服務器收集日誌。 假設我有三個App服務器,App-A,App-B,App-C。一個配置單元正在運行的HDFS服務器。 現在flume代理正在所有3個App服務器上運行,並將日誌消息從應用程序服務器傳遞到Hdfs服務器,其中另一個flume代理正在運行,並且日誌存儲在hadoop文件系統中。現在我創建了一個外部Hive表來映射這些日誌數據。 但是,除了配置單元無法正確解析日誌數據並將其存儲在表中這一事實之外,一切正常。使用Flume Avro的日誌數據未正確存儲在Hive中
這裏是我的水槽和蜂巢配置:
虛擬日誌文件格式(|隔開):客戶端Id |應用要求|網址
水槽的conf在應用服務器:
app-agent.sources = tail
app-agent.channels = memoryChannel
app-agent.sinks = avro-forward-sink
app-agent.sources.tail.type = exec
app-agent.sources.tail.command = tail -F /home/kuntal/practice/testing/application.log
app-agent.sources.tail.channels = memoryChannel
app-agent.channels.memoryChannel.type = memory
app-agent.channels.memoryChannel.capacity = 100000
app-agent.channels.memoryChannel.transactioncapacity = 10000
app-agent.sinks.avro-forward-sink.type = avro
app-agent.sinks.avro-forward-sink.hostname = localhost
app-agent.sinks.avro-forward-sink.port = 10000
app-agent.sinks.avro-forward-sink.channel = memoryChannel
Hdfs服務器上的Flume conf:
hdfs-agent.sources = avro-collect
hdfs-agent.channels = memoryChannel
hdfs-agent.sinks = hdfs-write
hdfs-agent.sources.avro-collect.type = avro
hdfs-agent.sources.avro-collect.bind = localhost
hdfs-agent.sources.avro-collect.port = 10000
hdfs-agent.sources.avro-collect.channels = memoryChannel
hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 100000
hdfs-agent.channels.memoryChannel.transactioncapacity = 10000
hdfs-agent.sinks.hdfs-write.channel = memoryChannel
hdfs-agent.sinks.hdfs-write.type = hdfs
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:9000/user/flume/tail_table/avro
hdfs-agent.sinks.hdfs-write.rollInterval = 30
蜂巢外部表:
CREATE EXTERNAL TABLE IF NOT EXISTS test(clientId int, itemType string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION '/user/flume/tail_table/avro';
請建議我該怎麼辦?我需要將AvroSerde包括在蜂巢端嗎?