2011-09-05 40 views
0

我創造了我的任期向量作爲陳述here這樣的:亨利馬烏LDA給FileNotFound例外

~/Scripts/Mahout/trunk/bin/mahout seqdirectory --input /home/ben/Scripts/eipi/files --output /home/ben/Scripts/eipi/mahout_out -chunk 1 
~/Scripts/Mahout/trunk/bin/mahout seq2sparse -i /home/ben/Scripts/eipi/mahout_out -o /home/ben/Scripts/eipi/termvecs -wt tf -seq 

然後我跑

~/Scripts/Mahout/trunk/bin/mahout lda -i /home/ben/Scripts/eipi/termvecs -o /home/ben/Scripts/eipi/lda_working -k 2 -v 100 

,我也得到:

亨利馬烏-JOB: /home/ben/Scripts/Mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar 11/09/04 16:28:59 INFO commo n.AbstractJob:命令行參數:{--endPhase = 2147483647,--input =/home/ben/Scripts/eipi/termvecs,--maxIter = -1,--numTopics = 2,--numWords = 100, - -output =/home/ben/Scripts/eipi/lda_working,--startPhase = 0,--tempDir = temp,--topicSmoothing = -1.0} 11/09/04 16:29:00 INFO lda.LDADriver:LDA迭代1 11/09/04 16:29:01 INFO input.FileInputFormat:要輸入的總輸入路徑:4 11/09/04 16:29:01信息mapred.JobClient:清理臨時區域文件:/ tmp /hadoop-ben/mapred/staging/ben692167368/.staging/job_local_0001 線程「main」中的異常java.io.FileNotFoundException:文件文件:/ home/ben/Scripts/eipi/termvecs/tokenized-documents/data不存在。 在org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371) 在org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) 在org.apache.hadoop.mapreduce。 lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) at org.apache.hadoop.mapred.JobClient。 (org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:919) at org.apache.hadoop.mapred.JobClient.access $ 500(JobClient.java:170) 在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:838) $ 2.run(JobClient.jav a:791) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation .java:1059) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791) at org.apache.hadoop.mapreduce.Job.submit(Job.java:465) at org.apache .hadoop.mapreduce.Job.waitForCompletion(Job.java:494) at org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:426) at org.apache.mahout.clustering.lda.LDADriver .run(LDADriver.java:226) at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:174) at org.apache.hadoop.util.ToolRunner.run( ToolRunner.java:65) 在org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:90) 在sun.reflect.NativeMethodAccessorImpl.invoke0(本機方法) 在sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:39) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 在java.lang.reflect.Method.invoke(Method.java:597) 在org.apache.hadoop.util .ProgramDriver $ ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java :188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl。Java的:25) 在java.lang.reflect.Method.invoke(Method.java:597) 在org.apache.hadoop.util.RunJar.main(RunJar.java:156)

這是正確的該文件不存在。我應該如何創建它?

回答

0

這些矢量可能是空的,因爲它們的創建可能存在問題。檢查您的矢量是否在其文件夾中成功創建(沒有0字節的文件大小)。如果您輸入的文件夾丟失了一些文件,則可能會發生此錯誤。在這種情況下,這兩個步驟將起作用,但不會創建有效的輸出。

相關問題