2014-10-05 65 views
1

我已經成功配置了flume以將文本文件從本地文件夾傳輸到hdfs。我的問題是,當這個文件被轉移到hdfs中,一些不需要的文本「hdfs.write.Longwriter +二進制字符」在我的文本文件中加上前綴。 這裏是我的flume.confsink.hdfs編寫者在我的文本文件中添加垃圾

agent.sources = flumedump 
agent.channels = memoryChannel 
agent.sinks = flumeHDFS 

agent.sources.flumedump.type = spooldir 
agent.sources.flumedump.spoolDir = /opt/test/flume/flumedump/ 
agent.sources.flumedump.channels = memoryChannel 

# Each sink's type must be defined 
agent.sinks.flumeHDFS.type = hdfs 
agent.sinks.flumeHDFS.hdfs.path = hdfs://bigdata.ibm.com:9000/user/vin 
agent.sinks.flumeHDFS.fileType = DataStream 

#Format to be written 
agent.sinks.flumeHDFS.hdfs.writeFormat = Text 

agent.sinks.flumeHDFS.hdfs.maxOpenFiles = 10 
# rollover file based on maximum size of 10 MB 
agent.sinks.flumeHDFS.hdfs.rollSize = 10485760 

# never rollover based on the number of events 
agent.sinks.flumeHDFS.hdfs.rollCount = 0 

# rollover file based on max time of 1 mi 
agent.sinks.flumeHDFS.hdfs.rollInterval = 60 


#Specify the channel the sink should use 
agent.sinks.flumeHDFS.channel = memoryChannel 

# Each channel's type is defined. 
agent.channels.memoryChannel.type = memory 

# Other config values specific to each type of channel(sink or source) 
# can be defined as well 
# In this case, it specifies the capacity of the memory channel 
agent.channels.memoryChannel.capacity = 100 

我的源文本文件是包含非常簡單的文本: 大家好,我的名字是Hadoop的,這是文件中的一個。

我在hdfs中獲得的sink文件如下所示: SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text 5 > I < 4H ǥ + Hi我的名字是Hadoop,這是一個文件。

請讓我知道我做錯了什麼?

回答

4

想通了。 我不得不修復這條線

agent.sinks.flumeHDFS.fileType = DataStream

,並更改爲

agent.sinks.flumeHDFS.hdfs.fileType = DataStream

這個固定的問題。

相關問題