You use the hadoop fs –put command to write a 300 MB file using and HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this file?
a.) They would see Hadoop throw an ConcurrentFileAccessException when they try to access this file.
b.) They would see the current state of the file, up to the last bit written by the command.
c.) They would see the current of the file through the last completed block.
d.) They would see no content until the whole file written and closed.
從我的理解對hadoop fs -put
命令答案是d一個文件,但是有人說這是C.訪問正在寫入
誰能提供任何選項建設性的解釋嗎?
感謝XX
確實可以在塊級別上使用塊,即如果您知道要查找哪個塊,則可以訪問單個塊。但是,如果包含文件名 - >塊之間映射關係的元數據在所有塊寫入之前都不可用,則文件本身對用戶將不可見,因爲所有文件系統請求都通過名稱節點路由 – Chaos 2014-10-30 14:20:45
這是我觀察到的,當把一個大文件複製到HDFS時,文件名被創建爲'[FILENAME] _COPYING_',而當寫入仍在進行時,如果你嘗試對文件執行讀操作('[FILENAME] _COPYING'),你可以仍然讀取文件直到寫入最後一個塊。我已經在Hadoop 2.4集羣中測試了這種行爲。因此,從這種行爲中,我猜測NameNode在塊被刷新後立即更新塊映射('hflush()'),並且發回ACK。一旦文件寫入完成,HDFS中的文件被重命名爲「[FILENAME]」。 – Ashrith 2014-10-30 17:09:57
@AshrithM我試着用-PUT命令寫入300MB文件,並用另一個用戶的-CAT命令讀取它,並得到「文件不存在」消息。你用什麼命令(方法)來讀取正在寫入的文件? – Dennis 2014-10-31 08:07:26