2014-01-20 39 views
0

我正在使用具有HDFS作爲接收器的cloudera flume的假脫機目錄源。 我正面臨串行器已關閉錯誤。 我只是複製文件一個的時間和發生這種情況後,我複製使用的第一個文件SCP假脫機目錄源出現異常[序列化程序已關閉]

我的代理人如下:

agentaccesscombined.sources=spooldir-accesscombinedsource 
    agentaccesscombined.sinks=hdfs-accesscombinedsink 
    agentaccesscombined.channels=chaccesscombined 

    # flume spooldir source 
    agentaccesscombined.sources.spooldir-accesscombinedsource.type=spooldir 
    agentaccesscombined.sources.spooldir-accesscombinedsource.spoolDir=/var/spoolAccessCombinedDir 
    agentaccesscombined.sources.spooldir-accesscombinedsource.ignorePattern=\\w.*.filepart 
    agentaccesscombined.sources.spooldir-accesscombinedsource.deletePolicy=immediate 
    agentaccesscombined.sources.spooldir-accesscombinedsource.fileSuffix=.SPOOL 
    agentaccesscombined.sources.spooldir-accesscombinedsource.fileHeader=true 
    agentaccesscombined.sources.spooldir-accesscombinedsource.bufferMaxLineLength=70000 
    agentaccesscombined.sources.spooldir-accesscombinedsource.bufferMaxLines=10000 
    agentaccesscombined.sources.spooldir-accesscombinedsource.batchSize=1000 
    agentaccesscombined.sources.spooldir-accesscombinedsource.fileHeaderKey=file 

    #flume hdfs-sink 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.type=hdfs 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.path=hdfs://cldx-1044:1200:8020/flumeOut_spoolDir_access_combined 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollSize=12553700 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollCount=12553665 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.rollInterval=100000 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.fileType=DataStream 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.writeFormat=Text 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.round = true 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.roundValue=50 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.roundUnit=minute 
    agentaccesscombined.sinks.hdfs-accesscombinedsink.hdfs.idleTimeout=5 

    #flume channel 
    agentaccesscombined.channels.chaccesscombined.type=file 
    agentaccesscombined.channels.chaccesscombined.capacity=1000000 
    agentaccesscombined.channels.chaccesscombined.transactionCapacity = 1000 
    agentaccesscombined.channels.chaccesscombined.checkpointInterval=30000 
    agentaccesscombined.channels.chaccesscombined.maxFileSize=2146435071 
    agentaccesscombined.channels.chaccesscombined.minimumRequiredSpace=524288000 
    agentaccesscombined.channels.chaccesscombined.keep-alive=30 
    agentaccesscombined.channels.chaccesscombined.write-timeout=30 
    agentaccesscombined.channels.chaccesscombined.checkpoint-timeout=6000 
    agentaccesscombined.channels.chaccesscombined.checkpointDir=/tmp/flume/java/checkpoint_accesscombined 
    agentaccesscombined.channels.chaccesscombined.dataDirs=/tmp/flume/java/data_accesscombined 


agentaccesscombined.sources.spooldir-accesscombinedsource.channels=chaccesscombined 
agentaccesscombined.sinks.hdfs-accesscombinedsink.channel=chaccesscombined 

如果我使用的WinSCP其複製文件工作正常,但不使用scp。 請幫我一把。

在此先感謝。

回答

0

要解決即時問題,請重新啓動您的flume agent。 然後使用複製原子文件的方法。

假脫機目錄源要求該文件一旦開始讀取就不會改變。如果文件發生變化,它會記錄一條錯誤消息,並開始產生類似上面顯示的錯誤。

cp不是原子的。我不知道scp等。也許複製到一個臨時目錄,然後使用mv

0

您可以使用winscp將您的文件上傳到臨時目錄,然後通過「mv」移動到flume監視目錄。 mv操作是原子操作。你可能需要通知自動化。