2014-01-06 34 views
12

我試圖在Java中運行Mallet,並且出現以下錯誤。在Java中運行MALLET

Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file. 
Perhaps the 'resources' directories weren't copied into the 'class' directory. 
Continuing. 

我試圖運行從馬利特的網站(http://mallet.cs.umass.edu/topics-devel.php)的例子。以下是我的代碼。任何幫助表示讚賞。

package scriptAnalyzer; 

import cc.mallet.util.*; 
import cc.mallet.types.*; 
import cc.mallet.pipe.*; 
import cc.mallet.pipe.iterator.*; 
import cc.mallet.topics.*; 

import java.util.*; 
import java.util.regex.*; 
import java.io.*; 

public class Mallet { 

    public static void main(String[] args) throws Exception { 

     String filePath = "C:/mallet/ap.txt"; 
     // Begin by importing documents from text to feature sequences 
     ArrayList<Pipe> pipeList = new ArrayList<Pipe>(); 

     // Pipes: lowercase, tokenize, remove stopwords, map to features 
     pipeList.add(new CharSequenceLowercase()); 
     pipeList.add(new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}"))); 
     pipeList.add(new TokenSequenceRemoveStopwords(new File("stoplists/en.txt"), "UTF-8", false, false, false)); 
     pipeList.add(new TokenSequence2FeatureSequence()); 

     InstanceList instances = new InstanceList (new SerialPipes(pipeList)); 

     Reader fileReader = new InputStreamReader(new FileInputStream(new File(filePath)), "UTF-8"); 
     instances.addThruPipe(new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"), 
               3, 2, 1)); // data, label, name fields 

     // Create a model with 100 topics, alpha_t = 0.01, beta_w = 0.01 
     // Note that the first parameter is passed as the sum over topics, while 
     // the second is the parameter for a single dimension of the Dirichlet prior. 
     int numTopics = 5; 
     ParallelTopicModel model = new ParallelTopicModel(numTopics, 1.0, 0.01); 

     model.addInstances(instances); 

     // Use two parallel samplers, which each look at one half the corpus and combine 
     // statistics after every iteration. 
     model.setNumThreads(2); 

     // Run the model for 50 iterations and stop (this is for testing only, 
     // for real applications, use 1000 to 2000 iterations) 
     model.setNumIterations(50); 
     model.estimate(); 

     // Show the words and topics in the first instance 

     // The data alphabet maps word IDs to strings 
     Alphabet dataAlphabet = instances.getDataAlphabet(); 

     FeatureSequence tokens = (FeatureSequence) model.getData().get(0).instance.getData(); 
     LabelSequence topics = model.getData().get(0).topicSequence; 

     Formatter out = new Formatter(new StringBuilder(), Locale.US); 
     for (int position = 0; position < tokens.getLength(); position++) { 
      out.format("%s-%d ", dataAlphabet.lookupObject(tokens.getIndexAtPosition(position)), topics.getIndexAtPosition(position)); 
     } 
     System.out.println(out); 

     // Estimate the topic distribution of the first instance, 
     // given the current Gibbs state. 
     double[] topicDistribution = model.getTopicProbabilities(0); 

     // Get an array of sorted sets of word ID/count pairs 
     ArrayList<TreeSet<IDSorter>> topicSortedWords = model.getSortedWords(); 

     // Show top 5 words in topics with proportions for the first document 
     for (int topic = 0; topic < numTopics; topic++) { 
      Iterator<IDSorter> iterator = topicSortedWords.get(topic).iterator(); 

      out = new Formatter(new StringBuilder(), Locale.US); 
      out.format("%d\t%.3f\t", topic, topicDistribution[topic]); 
      int rank = 0; 
      while (iterator.hasNext() && rank < 5) { 
       IDSorter idCountPair = iterator.next(); 
       out.format("%s (%.0f) ", dataAlphabet.lookupObject(idCountPair.getID()), idCountPair.getWeight()); 
       rank++; 
      } 
      System.out.println(out); 
     } 

     // Create a new instance with high probability of topic 0 
     StringBuilder topicZeroText = new StringBuilder(); 
     Iterator<IDSorter> iterator = topicSortedWords.get(0).iterator(); 

     int rank = 0; 
     while (iterator.hasNext() && rank < 5) { 
      IDSorter idCountPair = iterator.next(); 
      topicZeroText.append(dataAlphabet.lookupObject(idCountPair.getID()) + " "); 
      rank++; 
     } 

     // Create a new instance named "test instance" with empty target and source fields. 
     InstanceList testing = new InstanceList(instances.getPipe()); 
     testing.addThruPipe(new Instance(topicZeroText.toString(), null, "test instance", null)); 

     TopicInferencer inferencer = model.getInferencer(); 
     double[] testProbabilities = inferencer.getSampledDistribution(testing.get(0), 10, 1, 5); 
     System.out.println("0\t" + testProbabilities[0]); 
    } 

} 
+1

聽起來像缺少'resources/logging.properties'。它使用Maven嗎?螞蟻?你是否正確地構建它? –

回答

5

如果您嘗試通過下載2.0.8版,快照(https://github.com/mimno/Mallet)或者通過獲取最新的當前Maven版本(2.0.7),您將收到此錯誤或者運行槌。

原因是Mallet期望創建的target\classes\cc\mallet\util\resources文件夾中的文件logging.properties。當您使用maven構建項目時,不會創建此文件,因此此例外發生在MalletLogger.java中。

有人應該正確配置maven,以便在目標文件夾中創建logging.properties文件。臨時解決方案是修改Mallet代碼,爲logging.properties設置另一條路徑。

6

爲別人誰在使用Maven和嘗試配置馬利特的日誌,試試這個:

src/mallet_resources/logging.properties創建一個新的文本文件。它實際上並不需要指定任何東西;一個空的文件足以關閉馬勒。

然後修改您的pom.xml文件以確保該文件被複制到其他答案中提到的位置。要做到這一點,在<build><plugins>部分,添加:

<!--Mallet logging is horrifically verbose, and has not easy to configure--> 
<!--We have to use this complicated process to copy the logging.properties file to the right location --> 
<plugin> 
    <artifactId>maven-resources-plugin</artifactId> 
    <version>2.6</version> 
    <executions> 
     <execution> 
      <id>copy-resources</id> 
      <phase>validate</phase> 
      <goals> 
       <goal>copy-resources</goal> 
      </goals> 
      <configuration> 
       <outputDirectory> 
        ${basedir}/target/classes/cc/mallet/util/resources 
       </outputDirectory> 
       <resources> 
        <resource> 
         <directory>src/mallet-resources</directory> 
         <filtering>true</filtering> 
        </resource> 
       </resources> 
      </configuration> 
     </execution> 
    </executions> 
</plugin> 
0

關於「無法打開edu.umass.cs.mallet.base.util.MalletLogger資源/ logging.properties文件」的錯誤,遇到的(例如, )在BANNER命名實體識別(使用MALLET)中運行run.sh(或其他腳本或命令)時。

解決方案:

的src /主/爪哇/ edu的/在umass/CS /槌/鹼/ util的/資源/ logging.properties複製 'logging.properties'

目標/階-2.11 /類/ edu的/在umass/CS /槌/鹼/ util的/資源/ logging.properties

[我使用在https://github.com/clulab/banner提供的BANNER]

我在同一時間遇到(...記錄「edu.umass.cs.mallet.base.util.Logger.DefaultConfigurator」失敗配置類),另一個錯誤可以安全地忽略:

https://osdir.com/ml/ai.mallet.devel/2007-11/msg00008.html >> 「我認爲這是一個分佈錯誤,但它隻影響日誌記錄。我一直忽略了這個警告「

http://comments.gmane.org/gmane.comp.ai.mallet.devel/200 >>‘這個錯誤應該不會影響你的輸出’

http://courses.washington.edu/ling572/winter09/teaching_slides/1_08_Mallet.pptx >>幻燈片20:」請忽略此消息。「[費霞,2009年1月,麻雷介紹,UMass的Andrew McCallum小組(https://people.cs.umass.edu/~mccallum/)]

10

如果你沒有在系統屬性中指定一個,Mallet尋找一個日誌文件。使用Maven是把文件中

src/main/resources/cc/mallet/util/resources/logging.properties 

這將複製它會自動具有與標準的Maven構建過程的一部分。

target/classes/cc/mallet/util/resources/logging.properties 

所以你不需要任何特殊的配置文件可以是空的,但它在邏輯上是del故意遺漏,所以你配置你自己的日誌記錄。

+0

也適用於Gradle。 –