由於上面的代碼是缺少在這裏編譯它需要的import語句等是在命令行工作的更完整的文本閱讀和轉儲詞典文件的
dumpdict.java的輸出:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.IOException;
class DumpDict {
public static void main(String[] args) {
try {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
SequenceFile.Reader read = new SequenceFile.Reader(fs, new Path(args[0]), conf);
IntWritable dicKey = new IntWritable();
Text text = new Text();
// HashMap dictionaryMap = new HashMap();
while (read.next(text, dicKey)) {
// dictionaryMap.put(Integer.parseInt(dicKey.toString()), text.toString());
System.out.println(dicKey.toString()+" "+text.toString());
}
read.close();
} catch (IOException e) {
System.out.println(e.toString());
}
}
}
我發現明確地告訴java的,有必要,所有的jar文件是:
export CLASSPATH=`find /path/to/mahout /usr/share/java -name '*.jar' | perl -ne 'chomp; push @jars, $_; END { print "\".:",(join ":",@jars),"\$CLASSPATH\"\n"; }'`
編譯如下:
javac dumpdict.java
像這樣運行:
java -cp .:$CLASSPATH DumpDict {path to dict}
(這也許是矯枉過正誰用java的人,但它可能會節省時間,對於我們這些誰不使用它經常)。
-a應該有一個完全合格的類名 – 2012-06-24 05:25:16