如何在map-reduce中讀取多個圖像文件作爲hdfs的輸入？

private static String[] testFiles = new String[]  {"img01.JPG","img02.JPG","img03.JPG","img04.JPG","img06.JPG","img07.JPG","img05.JPG"}; 
// private static String testFilespath = "/home/student/Desktop/images"; 
private static String testFilespath ="hdfs://localhost:54310/user/root/images"; 
//private static String indexpath = "/home/student/Desktop/indexDemo"; 
private static String testExtensive="/home/student/Desktop/images"; 

public static class MapClass extends MapReduceBase 
implements Mapper<Text, Text, Text, Text> { 
private Text input_image = new Text(); 
private Text input_vector = new Text(); 
    @Override 
public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter  reporter) throws IOException { 

System.out.println("CorrelogramIndex Method:"); 
     String featureString; 
int MAXIMUM_DISTANCE = 16; 
AutoColorCorrelogram.Mode mode = AutoColorCorrelogram.Mode.FullNeighbourhood; 
for (String identifier : testFiles) { 
      try (FileInputStream fis = new FileInputStream(testFilespath + "/" + identifier)) { 
    //Document doc = builder.createDocument(fis, identifier); 
//FileInputStream imageStream = new FileInputStream(testFilespath + "/" + identifier); 
BufferedImage bimg = ImageIO.read(fis); 
AutoColorCorrelogram vd = new AutoColorCorrelogram(MAXIMUM_DISTANCE, mode); 
       vd.extract(bimg); 
       featureString = vd.getStringRepresentation(); 
       double[] bytearray=vd.getDoubleHistogram(); 
       System.out.println("image: "+ identifier + " " + featureString); 

     } 
      System.out.println(" ------------- "); 
input_image.set(identifier); 
input_vector.set(featureString); 
    output.collect(input_image, input_vector); 
       } 

    } 
    } 

    public static class Reduce extends MapReduceBase 
    implements Reducer<Text, Text, Text, Text> { 

    @Override 
public void reduce(Text key, Iterator<Text> values, 
        OutputCollector<Text, Text> output, 
        Reporter reporter) throws IOException { 
    String out_vector=""; 

    while (values.hasNext()) { 
    out_vector.concat(values.next().toString()); 
} 
    output.collect(key, new Text(out_vector)); 
    } 
} 

static int printUsage() { 
System.out.println("image_mapreduce [-m <maps>] [-r <reduces>] <input> <output>"); 
ToolRunner.printGenericCommandUsage(System.out); 
return -1; 
} 


@Override 
    public int run(String[] args) throws Exception { 
JobConf conf = new JobConf(getConf(), image_mapreduce.class); 
conf.setJobName("image_mapreduce"); 

// the keys are words (strings) 
conf.setOutputKeyClass(Text.class); 
// the values are counts (ints) 
conf.setOutputValueClass(Text.class); 

conf.setMapperClass(MapClass.class);   
// conf.setCombinerClass(Reduce.class); 
conf.setReducerClass(Reduce.class); 

List<String> other_args = new ArrayList<String>(); 
for(int i=0; i < args.length; ++i) { 
    try { 
    if ("-m".equals(args[i])) { 
     conf.setNumMapTasks(Integer.parseInt(args[++i])); 
    } else if ("-r".equals(args[i])) { 
     conf.setNumReduceTasks(Integer.parseInt(args[++i])); 
    } else { 
     other_args.add(args[i]); 
    } 
    } catch (NumberFormatException except) { 
    System.out.println("ERROR: Integer expected instead of " + args[i]); 
    return printUsage(); 
    } catch (ArrayIndexOutOfBoundsException except) { 
    System.out.println("ERROR: Required parameter missing from " + 
         args[i-1]); 
    return printUsage(); 
    } 
} 



    FileInputFormat.setInputPaths(conf, other_args.get(0)); 
    //FileInputFormat.setInputPaths(conf,new Path("hdfs://localhost:54310/user/root/images")); 
FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1))); 

JobClient.runJob(conf); 
return 0; 
} 


public static void main(String[] args) throws Exception { 
int res = ToolRunner.run(new Configuration(), new image_mapreduce(), args); 
System.exit(res); 
} 

}

`我正在編寫一個程序，它將多個圖像文件作爲輸入，存儲在hdfs &中提取地圖功能中的特徵。我如何指定在FileInputStream中讀取圖像的路徑（一些參數）？或者有什麼方法可以讀取多個圖像文件？如何在map-reduce中讀取多個圖像文件作爲hdfs的輸入？

我想要做的是： - 以hdfs中的多個圖像文件作爲輸入 - 提取地圖功能中的特徵。 - 迭代式減少。請幫助我在代碼或更好的方式來做到這一點。

來源

2012-05-30 Amnesiac

使用HIPI library進行研究 - 它將圖像集合存儲到ImageBundle（將HDFS中的各個圖像文件存儲起來更高效）。他們也有一些例子。

至於你的代碼，你需要指定你打算使用什麼輸入和輸出格式。沒有當前的輸入格式來傳遞整個文件，但是您可以擴展FileInputFormat並創建一個發射<Text, BytesWritable>對的RecordReader，其中鍵爲文件名，值爲映像文件的字節數。

事實上Hadoop - The Definitive Guide有這樣精確的輸入格式的例子：

來源

2012-05-30 10:56:16

@Chris ...謝謝你的回覆..但我目前只使用LIRe，lucene APIs ...你能告訴我，我寫的代碼是否正確？ – Amnesiac

更新與您的代碼意見 –

@Chris ...再次感謝..嘿，但我不知道如何指定路徑？上述代碼中指定的路徑是否正確？如果您編輯上面的代碼並將其轉換爲必需，以便我可以趕上它並繼續進行，這將會很有幫助。謝謝.. – Amnesiac

如果你想所有的圖像發送輸入到你剛纔設置的conf.setFileInputPath（）將輸入的目錄MR任務如果您想在特定文件夾中發送選擇性圖像您可以在設置conf.setFileInputPath（）時添加多個路徑;

一種方法是爲每個圖像創建一個Path []。或者將其設置爲逗號分隔的字符串與所有路徑。通過下列文件

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html

而且你要設置的地圖輸入格式爲文本一件事去，字節數組從那個ByteArray的輸入，而不是創造新的FileInputStream的圖像特徵。

來源

2013-07-04 10:07:28 Siddhartha

如何在map-reduce中讀取多個圖像文件作爲hdfs的輸入？

回答

相關問題