2012-09-06 28 views
1

我試圖使用Flume-ng(1.2)將平面文件(日誌文件)中的數據加載到Hbase中。平面文件有多個列,每個列都是冒號(:),它們都需要在HBASE中加載到獨立的列中。我正在檢查論壇,我發現有一個來自Apache的jar來解決這個問題(org.apache.flume.sink.hbase.RegexHbaseEventSerializer),但我無法找到任何配置文件或在互聯網上的使用。如果有人能幫助我的配置文件,這將是有益的FlumeNG中的正則表達式配置

含量的平面文件 1:NN 2:PP 3毫米

感謝

回答

1

RegexHbaseEventSerializer有三個配置參數,你可以設置(as described in the source code);它們是:

/** Regular expression used to parse groups from event data. */ 
public static final String REGEX_CONFIG = "regex"; 

/** Whether to ignore case when performing regex matches. */ 
public static final String IGNORE_CASE_CONFIG = "regexIgnoreCase"; 

/** Comma separated list of column names to place matching groups in. */ 
public static final String COL_NAME_CONFIG = "colNames"; 

使用RegexHbaseEventSerializer會是這樣(從Cloudera's Flume and HBase presentation部分引述)的配置實例:

host1.sources = src1 
host1.sinks = sink1 
host1.channels = ch1 

host1.sources.src1.type = seq 
host1.sources.src1.port = 25001 
host1.sources.src1.bind = localhost 
host1.sources.src1.channels = ch1 

host1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink 
host1.sinks.sink1.channel = ch1 
host1.sinks.sink1.table = test3 
host1.sinks.sink1.columnFamily = testing 

host1.sinks.sink1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer 
host1.sinks.sink1.serializer.regex = X 
host1.sinks.sink1.serializer.regexIgnoreCase = true 
host1.sinks.sink1.serializer.colNames = column_1,column_2,column_3 

host1.channels.ch1.type=memory10