2012-11-25 26 views
0

我想從多個文件中統計詞頻。查詢關於java中的數據結構

而且,我必須在這些文件中

a1.txt = {aaa, aaa, aaa} 
a2.txt = {aaa} 
a3.txt = {aaa, bbb} 

所以這些詞語,結果必須是AAA = 3,BBB = 1

然後,我有限定上述數據結構,

LinkedHashMap<String, Integer> wordCount = new LinkedHashMap<String, Integer>(); 
Map<String, LinkedHashMap<String, Integer>> 
fileToWordCount = new HashMap<String,LinkedHashMap<String, Integer>>(); 

,然後,我從文件中讀取單詞,並把它們的wordCount中和fileToWordCount:

/*lineWords[i] is a word from a line in the file*/ 
if(wordCount.containsKey(lineWords[i])){ 
    System.out.println("1111111::"+lineWords[i]); 
    wordCount.put(lineWords[i], wordCount. 
    get(lineWords[i]).intValue()+1); 
    }else{ 
    System.out.println("222222::"+lineWords[i]); 
    wordCount.put(lineWords[i], 1); 
} 
fileToWordCount.put(filename, wordCount); //here we map filename 
and occurences  of  words 

最後,我打印與上面的代碼中fileToWordCount,

Collection a; 
Set filenameset; 

     filenameset = fileToWordCount.keySet();  
     a = fileToWordCount.values();   
     for(Object filenameFromMap: filenameset){ 
        System.out.println("FILENAMEFROMAP::"+filenameFromMap);         
       System.out.println("VALUES::"+a);             
     } 

和印刷品,

FILENAMEFROMAP::a3.txt 
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}] 
FILENAMEFROMAP::a1.txt 
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}] 
FILENAMEFROMAP::a2.txt 
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}] 

所以,我怎麼可以使用地圖fileToWordCount找到文件詞頻?

+3

爲什麼不直接持有'地圖<字符串,請設置>'映射一個字一組文件,它出現在? –

+1

@Itay ..爲什麼不把它作爲答案?這似乎是一個有效的答案。 :) –

+0

@Rohit - 因爲問題是如何使用'fileToWordCount'和我的答案不使用'fileToWordCount' :) –

回答

1

你讓事情變得更加困難。這是我會怎麼做:

Map<String, Counter> wordCounts = new HashMap<String, Counter>(); 
for (File file : files) { 
    Set<String> wordsInFile = new HashSet<String>(); // to avoid counting the same word in the same file twice 
    for (String word : readWordsFromFile(file)) { 
     if (!wordsInFile.contains(word)) { 
      wordsInFile.add(word); 
      Counter counter = wordCounts.get(word); 
      if (counter == null) { 
       counter = new Counter(); 
       wordCounts.put(word, counter); 
      } 
      counter.increment(); 
     } 
    } 
} 
+0

在counter.incremenet();,它顯示我一個錯誤,因爲increment()方法不存在 – chkontog

+0

計數器將是你自己的類,包裝一個int值,可以遞增。對不起,如果不明確。您可以使用Integer來代替它,但由於它是不可變的,因此每次需要將其更新時,都需要在映射中將其替換。 –

0

如果我可以建議另一種方法:)

使用Map<String, Set<String>> map

foreach file f in files 
    foreach word w in f 
    if w in map.keys() 
     map[w].add(f) 
    else 
     initialize map w to be a set with the only element file