2011-03-20 43 views
-2
int queryVector = 1; 
    double similarity = 0.0; 
    int wordPower; 
    String[][] arrays = new String[filename][2]; 
    int row; 
    int col; 


    for (a = 0; a < filename; a++) { 
     int totalwordPower = 0; 
     int totalWords = 0; 
     try { 
      System.out 
        .println(" _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ "); 
      System.out.println("\n"); 
      System.out.println("The word inputted : " + word2); 
      File file = new File(
        "C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a 
          + ".txt"); 
      System.out.println(" _________________"); 

      System.out.print("| File = abc" + a + ".txt | \t\t \n"); 

      for (int i = 0; i < array2.length; i++) { 

       totalCount = 0; 
       wordCount = 0; 

       Scanner s = new Scanner(file); 
       { 
        while (s.hasNext()) { 
         totalCount++; 
         if (s.next().equals(array2[i])) 
          wordCount++; 

        } 

        System.out.print(array2[i] + " --> Word count = " 
          + "\t " + "|" + wordCount + "|"); 
        System.out.print(" Total count = " + "\t " + "|" 
          + totalCount + "|"); 
        System.out.printf(" Term Frequency = | %8.4f |", 
          (double) wordCount/totalCount); 

        System.out.println("\t "); 

        double inverseTF = Math.log10((float) numDoc 
          /(numofDoc[i])); 
        System.out.println(" --> IDF = " + inverseTF); 

        double TFIDF = (((double) wordCount/totalCount) * inverseTF); 
        System.out.println(" --> TF/IDF = " + TFIDF + "\n"); 

        totalWords += wordCount; 

        wordPower = (int) Math.pow(wordCount, 2); 

        totalwordPower += wordPower; 

        System.out.println("Document Vector : " + wordPower); 

        similarity = (totalWords * queryVector) 
          /((Math.sqrt((totalwordPower)) * (Math 
            .sqrt(((queryVector * 3)))))); 



       } 
      } 
     } catch (FileNotFoundException e) { 
      System.out.println("File is not found"); 
     } 
     System.out.println("The total query frequency for this file is " 
       + totalWords); 
     System.out.println("The total document vector : " + totalwordPower); 

     System.out.println("The similarity is " + similarity); 
    } 
} 

}排序從程序

您好我想從排序上面的代碼算出的相似度得分計算的分數。這是2個文本文件的示例輸出。我共有10個文本文件。

輸入的字:你怎麼樣


| File = abc0.txt |
how - > Word count = | 0 |總計數= | 1289 |術語頻率= | 0.0000 |
- > IDF = 1.0413926851582251 - > TF/IDF = 0.0

文獻載體:0 是 - >字數= | 0 |總計數= | 1289 |術語頻率= | 0.0000 |
- > IDF = 0.43933269383026263 - > TF/IDF = 0.0

文獻載體:0 你 - >字數= | 0 |總計數= | 1289 |術語頻率= | 0.0000 |
- > IDF = 0.1962946357308887 - > TF/IDF = 0.0

文獻載體:0 此文件的總的查詢頻率爲0 總文獻向量:0 相似度是NaN


輸入的字:你怎麼樣


| File = abc1.txt |
how - > Word count = | 0 |總計數= | 426 |術語頻率= | 0.0000 |
- > IDF = 1.0413926851582251 - > TF/IDF = 0.0

文獻載體:0 是 - >字數= | 0 |總計數= | 426 |術語頻率= | 0.0000 |
- > IDF = 0.43933269383026263 - > TF/IDF = 0.0

文獻載體:0 你 - >字數= | 3 |總計數= | 426 |術語頻率= | 0.0070 |
- > IDF = 0.1962946357308887 - > TF/IDF = 0.0013823565896541458

文獻載體:9 此文件的總的查詢次數是3 總文獻向量:9 相似度0.5773502691896257

注意:這是兩個文本文件的示例運行。我總共有10個文本文件。

如何將SIMILARITY分數從最高分到最低?任何建議?

回答

1

將SIMILARITY分數添加到列表中並使用庫方法進行排序。它按升序排序,您可以從最後讀取它。

ArrayList<Double> arrayList = new ArrayList<Double>(); 
Collections.sort(arrayList); 

或者你可以聲明一個比較器並像下面一樣使用它。

ArrayList<Double> arrayList = new ArrayList<Double>(); 
Comparator<Double> comparator = Collections.reverseOrder(); 
Collections.sort(arrayList,comparator); 

HTH