2015-01-21 41 views
0

我用flume-ng 1.5版本收集日誌。水槽損失數據當收集在線數據到hdfs

數據流中有兩個代理,它們分別位於兩臺主機上。

並且從代理1到代理2發送數據

藥劑的成分如下:

AGENT1:假脫機目錄源 - >文件信道 - >阿夫羅下沉

agent2:阿夫羅源 - >文件信道 - > HDFS水槽

但是,它似乎損失數據約百萬分之一百萬的數據。 爲了解決問題,我試過下列步驟操作:

  1. 查找代理日誌:找不到任何錯誤或異常。
  2. 查找代理監控指標:從通道中獲取的事件數始終等於
  3. 分別通過hive查詢和hdfs文件使用shell統計數據數:兩個數相等且小於在線數據數

AGENT1的配置:

#agent 
agent1.sources = src_spooldir 
agent1.channels = chan_file 
agent1.sinks = sink_avro 

#source 
agent1.sources.src_spooldir.type = spooldir 
agent1.sources.src_spooldir.spoolDir = /data/logs/flume-spooldir 
agent1.sources.src_spooldir.interceptors=i1 

#interceptors 
agent1.sources.src_spooldir.interceptors.i1.type=regex_extractor 
agent1.sources.src_spooldir.interceptors.i1.regex=(\\d{4}-\\d{2}-\\d{2}).* 
agent1.sources.src_spooldir.interceptors.i1.serializers=s1 
agent1.sources.src_spooldir.interceptors.i1.serializers.s1.name=dt 

#sink 
agent1.sinks.sink_avro.type = avro 
agent1.sinks.sink_avro.hostname = 10.235.2.212 
agent1.sinks.sink_avro.port = 9910 

#channel 
agent1.channels.chan_file.type = file 
agent1.channels.chan_file.checkpointDir = /data/flume/agent1/checkpoint 
agent1.channels.chan_file.dataDirs = /data/flume/agent1/data 

agent1.sources.src_spooldir.channels = chan_file 
agent1.sinks.sink_avro.channel = chan_file 

agent2的配置

# agent 
agent2.sources = source1 
agent2.channels = channel1 
agent2.sinks = sink1 

# source 
agent2.sources.source1.type  = avro 
agent2.sources.source1.bind  = 10.235.2.212 
agent2.sources.source1.port  = 9910 

# sink 
agent2.sinks.sink1.type= hdfs 
agent2.sinks.sink1.hdfs.fileType = DataStream 
agent2.sinks.sink1.hdfs.filePrefix = log 
agent2.sinks.sink1.hdfs.path = hdfs://hnd.hadoop.jsh:8020/data/%{dt} 
agent2.sinks.sink1.hdfs.rollInterval = 600 
agent2.sinks.sink1.hdfs.rollSize = 0 
agent2.sinks.sink1.hdfs.rollCount = 0 
agent2.sinks.sink1.hdfs.idleTimeout = 300 
agent2.sinks.sink1.hdfs.round = true 
agent2.sinks.sink1.hdfs.roundValue = 10 
agent2.sinks.sink1.hdfs.roundUnit = minute 

# channel 
agent2.channels.channel1.type = file 
agent2.channels.channel1.checkpointDir = /data/flume/agent2/checkpoint 
agent2.channels.channel1.dataDirs = /data/flume/agent2/data 

agent2.sinks.sink1.channel  = channel1 
agent2.sources.source1.channels = channel1 

歡迎任何建議!

回答

0

當遇到utf的某個特定字符時,在U + 10000和U + 10FFFF之間存在文件行解旋器中的錯誤,它們用utf16中的兩個16位代碼單元(稱爲代理對)表示。