使用亞馬遜s3作爲輸入，輸出並在EMR地圖中存儲中間結果reduce作業

我試圖在EMR中使用Amazon s3存儲。然而，當我目前運行我的代碼，我得到多個錯誤，如使用亞馬遜s3作爲輸入，輸出並在EMR地圖中存儲中間結果reduce作業

java.lang.IllegalArgumentException: This file system object (hdfs://10.254.37.109:9000) does not support access to the request path 's3n://energydata/input/centers_200_10k_norm.csv' You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path. 
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:384) 
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129) 
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) 
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:429) 
at edu.stanford.cs246.hw2.KMeans$CentroidMapper.setup(KMeans.java:112) 
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) 
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) 
at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:396) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) 
at org.apache.hadoop.mapred.Child.main(Child.java:249)

在主要我這樣設置我的輸入和輸出路徑，我把S3N：在配置CFILE //energydata/input/centers_200_10k_norm.csv我檢索映射器和減速機：

FileSystem fs = FileSystem.get(conf); 
conf.set(CFILE, inPath); //inPath in this case is s3n://energydata/input/centers_200_10k_norm.csv 
FileInputFormat.addInputPath(job, new Path(inputDir)); 
FileOutputFormat.setOutputPath(job, new Path(outputDir));

，我嘗試訪問CFILE其中上述錯誤發生在我的映射器和減速機的具體例子（S3N：//energydata/input/centers_200_10k_norm.csv）。這是我嘗試獲取路徑：

FileSystem fs = FileSystem.get(context.getConfiguration()); 
Path cFile = new Path(context.getConfiguration().get(CFILE)); 
DataInputStream d = new DataInputStream(fs.open(cFile)); ---->Error

S3N：//energydata/input/centers_200_10k_norm.csv是輸入參數的程序之一，當我開始了我的EMR的工作，我指定我的輸入和輸出目錄是s3n：// energydata /輸入和s3n：// energydata /輸出

我試着做什麼在file path in hdfs建議，但我仍然得到錯誤。任何幫助，將不勝感激。

謝謝！

來源

2013-04-25 Timnit Gebru

嘗試，而不是：

Path cFile = new Path(context.getConfiguration().get(CFILE)); 
FileSystem fs = cFile.getFileSystem(context.getConfiguration()); 
DataInputStream d = new DataInputStream(fs.open(cFile));

來源

2013-04-29 11:55:41

謝謝。我實際上通過使用下面的代碼來修復它：uriStr =「s3n：// energydata/output /」; URI uri = URI.create（uriStr）; FileSystem fs = FileSystem.get（uri，context.getConfiguration（））; （c.File））;如果我們使用了一個新的DataInputStream，那麼我們就可以使用這個方法來創建一個新的DataInputStream對象。 – 2013-04-29 16:35:21

是的 - 這也是一個類似的修復。主要的是在OP中，文件系統句柄是默認的句柄。 Path.getFileSystem或FileSystem.get（path，conf）獲取特定路徑的文件系統 – 2013-04-30 01:42:07

感謝。我實際上通過使用下面的代碼修復它：

String uriStr = "s3n://energydata/centroid/"; 
URI uri = URI.create(uriStr); 
FileSystem fs = FileSystem.get(uri, context.getConfiguration());  
Path cFile = new Path(context.getConfiguration().get(CFILE)); 
DataInputStream d = new DataInputStream(fs.open(cFile));

來源

2013-04-29 16:40:10

使用亞馬遜s3作爲輸入，輸出並在EMR地圖中存儲中間結果reduce作業

回答

相關問題