2015-10-16 64 views
0

我有一個小型項目來編寫twitter爬行程序,並且在分析收集到的tweets時遇到一些問題。無法讀取txt文件中的單詞並計算單詞數

收集到的推文放置在一個txt文件中。我想要實現的是統計txt文件中有多少單詞,包含單詞'engineering'的單詞數量以及hashtags的數量。下面是我到目前爲止已經試過,

import java.io.*; 
import java.util.StringTokenizer; 

public class TwitterAnalyzer { 

public static void main(String args[]){ 
    try{ 

     String keyword = "Engineering"; 
     FileInputStream fInstream = new FileInputStream("C:\\Users\\Alan\\Documents\\NetBeansProjects\\TwitterCrawler\\"+keyword+"-data.txt"); 
     DataInputStream in = new DataInputStream(fInstream); 
     BufferedReader br = new BufferedReader(new InputStreamReader(in)); 
     String strLine; 


     int numberOfKeywords = 0; 
     int numberOfWords = 0; 
     int numberOfHashtags = 0; 

     while((strLine = br.readLine()) != null){ 

      strLine = br.readLine(); 
      System.out.println(strLine); 
      StringTokenizer st = new StringTokenizer(strLine, " \t\n\r\f.,;:!?\""); 
      while(st.hasMoreTokens()){ 
       String word = st.nextToken(); 
       numberOfWords++; 
       if(word.contains(keyword)){ 
        numberOfKeywords++; 
       } 
       if(word.contains("#")){ 
        numberOfHashtags++; 
       } 
      } 
     } 



     System.out.println(numberOfWords); 
     System.out.println(numberOfKeywords); 
     System.out.println(numberOfHashtags); 
     br.close(); 

    }catch (FileNotFoundException fe){ 
     fe.printStackTrace(); 
     System.out.println("Unable to locate file"); 
     System.exit(-1); 
    }catch (IOException ie){ 
     ie.printStackTrace(); 
     System.out.println("Unable to read file"); 
     System.exit(-1); 
    }   


} 
} 

這裏是link到txt文件。

這裏的任何非常感謝!

+2

',而((strLine中= br.readLine())!= NULL){ strLine = br.readLine();'你爲每次迭代調用readLine()兩次。 – Natalia

+0

什麼'無法讀字'?任何特定的錯誤消息或意外結果?另外,如果你正在尋找單個詞的出現,「地圖」將是更好的選擇。 – sam

回答

1

下面的代碼返回:202,14,22

public static void main(String args[]){ 
    try{ 
     String keyword = "engineering"; 
     Pattern keywordPattern = Pattern.compile(keyword); 

     Pattern hashTagPattern = Pattern.compile("#[a-zA-Z0-9_]"); 

     FileInputStream fInstream = new FileInputStream("E:\\t.txt"); 
     BufferedReader br = new BufferedReader(new InputStreamReader(fInstream)); 
     String strLine; 


     int numberOfKeywords = 0; 
     int numberOfWords = 0; 
     int numberOfHashtags = 0; 

     while((strLine = br.readLine()) != null){ 
      Matcher matcher = keywordPattern.matcher(strLine.toLowerCase()); 
      while (matcher.find()) 
       numberOfKeywords++; 
      numberOfWords += strLine.split("\\s").length; 
      matcher = hashTagPattern.matcher(strLine); 
      while (matcher.find()) 
       numberOfHashtags++; 
     } 

     System.out.println(numberOfWords); 
     System.out.println(numberOfKeywords); 
     System.out.println(numberOfHashtags); 
     br.close(); 

    }catch (FileNotFoundException fe){ 
     fe.printStackTrace(); 
     System.out.println("Unable to locate file"); 
     System.exit(-1); 
    }catch (IOException ie){ 
     ie.printStackTrace(); 
     System.out.println("Unable to read file"); 
     System.exit(-1); 
    } 
} 
+0

嗨Sayed,非常感謝你的幫助!你救了我的一天!但是,你能解釋一下FileReader和FileInputStream有什麼區別嗎? – Alan1

+0

Alan1,如Oracle Java文檔中定義的FileReader用於讀取字符流。爲了讀取原始字節流,請考慮使用FileInputStream。我希望這也能幫到你: http://stackoverflow.com/questions/20927278/filereader-advantages-vs-fileinputstream-advantages#20927429 –

1

嘗試這種方式將有助於

import java.io.BufferedReader; 
import java.io.FileReader; 

public class CountWords { 

    public static void main (String args[]) throws Exception { 

     System.out.println ("Engineering");  
     FileReader fr = new FileReader ("c:\\Customer1.txt");   
     BufferedReader br = new BufferedReader (fr);  
     String line = br.readLin(); 
     int count = 0; 
     while (line != null) { 
      String []parts = line.split(" "); 
      for(String w : parts) 
      { 
      count++;   
      } 
      line = br.readLine(); 
     }   
     System.out.println(count); 
    } 
} 
+0

嗨拉卡恩,謝謝!有效。你能否告訴我如何檢查一個單詞是否包含「engineering」和「#」? – Alan1

+0

@ Alan1很高興它幫助看到這裏http://stackoverflow.com/questions/17134773/to-check-if-string-contains-particular-word –

相關問題