2016-10-05 42 views
0

我已經構建了倒排索引(wordTodocumentQueryMap)爲files.It收集它(JAVA)的數量包含每個appeear如何計算的文檔3個字出現在

如Word文件沒有和頻率這個:

experiment  1:1  17:1 30:1 39:1 52:1 109:2 
************* 
empirical  1:1  38:3 58:1 109:1 110:1 
************* 
flow:   1:1  2:6  3:2  4:3  6:1  7:3  9:3  16:1 17:1 

現在我需要做查詢(幾乎3個單詞),結果應該是所有單詞出現的文檔。爲結果(實驗經驗流量)應該是

1 : 3 

其中爲1的文檔否和3是相加術語頻率查詢詞語

但我的結果是:

1 : 3 2 : 6 3 : 2 4 : 3 6 : 1 7 : 3 9 : 3 16 : 1 17 : 2 

有它枚舉每個字

這裏的所有文件的問題是,我走到這一步,

代碼10
public static TreeMap<Integer, Integer> FileScore=new TreeMap<>(); 

在主

for(Map.Entry<String, Map<Integer,Integer>> wordTodocument : wordTodocumentQueryMap.entrySet()) 
    { 
    Map<Integer, Integer> documentToFrecuency_value = wordTodocument.getValue(); 
     for(Map.Entry<Integer, Integer> documentToFrecuency : documentToFrecuency_value.entrySet()) 
      { 
      int documentNo = documentToFrecuency.getKey(); 
      int wordCount = documentToFrecuency.getValue(); 
      int score=getScore(documentNo); 

       FileScore.put(documentNo, score+wordCount); 
     } 

    } 

//print the score 

for(Map.Entry<Integer,Integer> FileToScore : FileScore.entrySet()) 
{ 
     int documentNo = FileToScore.getKey(); 
     int Score = FileToScore.getValue(); 
     System.out.print(documentNo +" : "+ Score+"\t"); 

    } 


public static int getScore (int fileno){ 
if(FileScore.containsKey(fileno)) 
    return FileScore.get(fileno); 
return 0; 
} 
+0

您確定要在結果中使用'17:2'嗎?如果三個單詞都必須全部出現,那麼結果如何包括2的計數(分數,頻率)? –

+0

我改正了,謝謝 –

回答

0

下面的方法應該這樣做。

/** 
* Finds docuiments where all the given words appear. 
* 
* @param wordTodocumentQueryMap For each word maps file no. to frequency > 0 
* @param firstWord 
* @param otherWords 
* @return A frequency map containing file no. of files containing all of fisrtWord and otherWords mapped 
*   to a sum of counts for the words. 
*/ 
public static Map<Integer, Integer> docsWithAllWords(Map<String, Map<Integer, Integer>> wordTodocumentQueryMap, 
     String firstWord, String... otherWords) { 
    // result 
    Map<Integer, Integer> fileScore = new TreeMap<>(); 
    Map<Integer, Integer> firstWordCounts = wordTodocumentQueryMap.get(firstWord); 
    if (firstWordCounts == null) { // first word not found in any doc 
     // return empty result 
     return fileScore; 
    } 
    outer: for (Map.Entry<Integer, Integer> firstWordCountsEntry : firstWordCounts.entrySet()) { 
     Integer docNo = firstWordCountsEntry.getKey(); 
     int sumOfCounts = firstWordCountsEntry.getValue(); 
     // find out if both/all other words are in doc, and sum counts 
     for (String word : otherWords) { 
      Map<Integer, Integer> wordCountEntry = wordTodocumentQueryMap.get(word); 
      if (wordCountEntry == null) { 
       return fileScore; 
      } 
      Integer wordCount = wordCountEntry.get(docNo); 
      if (wordCount == null) { // word not found in doc 
       continue outer; 
      } 
      sumOfCounts += wordCount; 
     } 
     fileScore.put(docNo, sumOfCounts); 
    } 
    return fileScore; 
} 

它在Java中很少使用的功能:標籤,outer。如果你發現它太不尋常(或者只是不喜歡continue聲明),你可能會重寫爲使用布爾值。現在你可以撥打

docsWithAllWords(wordTodocumentQueryMap, "experiment", "empirical", "flow") 

它會給你1 : 3,沒有別的。

相關問題