2016-07-18 30 views
0

所以基本上這是一個解析器/餘弦矩陣計算器,但我不斷收到編譯錯誤。我認爲我有正確的閱讀文本文件的輸入路徑。但它仍然不會編譯。編譯錯誤,認爲我有我的輸入文件錯誤,但無法確定是什麼做錯了

這是我的主類:

import java.io.FileNotFoundException; 
    import java.io.IOException; 

    public class TfIdfMain { 

    public static void main(String args[]) throws FileNotFoundException, IOException { 
     DocumentParser dp = new DocumentParser(); 
     dp.parseFiles("C:/Users/dachen/Documents/doc1.txt"); // give the location of source file 
     dp.tfIdfCalculator(); //calculates tfidf 
     dp.getCosineSimilarity(); //calculates cosine similarity 
    } 
} 

我的分析器類:

import java.io.BufferedReader; 
import java.io.File; 
import java.io.FileNotFoundException; 
import java.io.FileReader; 
import java.io.IOException; 
import java.util.ArrayList; 
import java.util.List; 

public class DocumentParser { 

    //This variable will hold all terms of each document in an array. 
    private List<String[]> termsDocsArray = new ArrayList<String[]>(); 
    private List<String> allTerms = new ArrayList<String>(); //to hold all terms 
    private List<double[]> tfidfDocsVector = new ArrayList<double[]>(); 

    /** 
    * Method to read files and store in array. 
    */ 
    public void parseFiles(String filePath) throws FileNotFoundException, IOException { 
     File[] allfiles = new File(filePath).listFiles(); 
     BufferedReader in = null; 
     for (File f : allfiles) { 
      if (f.getName().endsWith(".txt")) { 
       in = new BufferedReader(new FileReader(f)); 
       StringBuilder sb = new StringBuilder(); 
       String s = null; 
       while ((s = in.readLine()) != null) { 
        sb.append(s); 
       } 
       String[] tokenizedTerms = sb.toString().replaceAll("[\\W&&[^\\s]]", "").split("\\W+"); //to get individual terms 
       for (String term : tokenizedTerms) { 
        if (!allTerms.contains(term)) { //avoid duplicate entry 
         allTerms.add(term); 
        } 
       } 
       termsDocsArray.add(tokenizedTerms); 
      } 
     } 

    } 

    /** 
    * Method to create termVector according to its tfidf score. 
    */ 
    public void tfIdfCalculator() { 
     double tf; //term frequency 
     double idf; //inverse document frequency 
     double tfidf; //term requency inverse document frequency   
     for (String[] docTermsArray : termsDocsArray) { 
      double[] tfidfvectors = new double[allTerms.size()]; 
      int count = 0; 
      for (String terms : allTerms) { 
       tf = new TfIdf().tfCalculator(docTermsArray, terms); 
       idf = new TfIdf().idfCalculator(termsDocsArray, terms); 
       tfidf = tf * idf; 
       tfidfvectors[count] = tfidf; 
       count++; 
      } 
      tfidfDocsVector.add(tfidfvectors); //storing document vectors;    
     } 
    } 

    /** 
    * Method to calculate cosine similarity between all the documents. 
    */ 
    public void getCosineSimilarity() { 
     for (int i = 0; i < tfidfDocsVector.size(); i++) { 
      for (int j = 0; j < tfidfDocsVector.size(); j++) { 
       System.out.println("between " + i + " and " + j + " = " 
            + new CosineSimilarity().cosineSimilarity 
             (
             tfidfDocsVector.get(i), 
             tfidfDocsVector.get(j) 
             ) 
           ); 
      } 
     } 
    } 
} 

這是我的錯誤:

Exception in thread "main" java.lang.NullPointerException 
    at DocumentParser.parseFiles(DocumentParser.java:22) 
    at TfIdfMain.main(TfIdfMain.java:7) 

我在文檔中的文本文件路徑有誤嗎?

+2

我很困惑 - 你說你得到一個編譯錯誤,但是,然後你顯示一個運行時異常,而不是。你能澄清嗎? – ruakh

+0

對不起,我不何處運行時間錯誤是,在異常線程「主」顯示java.lang.NullPointerException \t在DocumentParser.parseFiles(DocumentParser.java:22) \t在TfIdfMain.main(TfIdfMain.java:7 ) –

回答

1

Windows文件路徑應該使用\而不是/ 。另外還有另一個bug,代碼不需要整個文件路徑,只是目錄路徑。 所以不是

dp.parseFiles("C:/Users/dachen/Documents/doc1.txt"); 

應該

dp.parseFiles("C:\\Users\\dachen\\Documents"); 
+1

您可以顯示正確的串... –

+0

@mursaleen艾哈邁德對不起先生艾哈邁德仍然得到錯誤:在DocumentParser.parseFiles(DocumentParser.java:22) \t在線程異常「主」顯示java.lang.NullPointerException \t TfIdfMain.main(TfIdfMain.java:7) –

+0

你應該在路徑中傳遞整個文件名嗎?在'parseFiles'函數中你已經這樣做了:'File [] allfiles = new File(filePath).listFiles();' –

0

listFiles()的文檔指出它:

Returns null if this abstract pathname does not denote a directory

要傳遞的路徑是不是一個目錄。

相關問題