我想從多個文件中統計詞頻。查詢關於java中的數據結構
而且,我必須在這些文件中
a1.txt = {aaa, aaa, aaa}
a2.txt = {aaa}
a3.txt = {aaa, bbb}
所以這些詞語,結果必須是AAA = 3,BBB = 1
然後,我有限定上述數據結構,
LinkedHashMap<String, Integer> wordCount = new LinkedHashMap<String, Integer>();
Map<String, LinkedHashMap<String, Integer>>
fileToWordCount = new HashMap<String,LinkedHashMap<String, Integer>>();
,然後,我從文件中讀取單詞,並把它們的wordCount中和fileToWordCount:
/*lineWords[i] is a word from a line in the file*/
if(wordCount.containsKey(lineWords[i])){
System.out.println("1111111::"+lineWords[i]);
wordCount.put(lineWords[i], wordCount.
get(lineWords[i]).intValue()+1);
}else{
System.out.println("222222::"+lineWords[i]);
wordCount.put(lineWords[i], 1);
}
fileToWordCount.put(filename, wordCount); //here we map filename
and occurences of words
最後,我打印與上面的代碼中fileToWordCount,
Collection a;
Set filenameset;
filenameset = fileToWordCount.keySet();
a = fileToWordCount.values();
for(Object filenameFromMap: filenameset){
System.out.println("FILENAMEFROMAP::"+filenameFromMap);
System.out.println("VALUES::"+a);
}
和印刷品,
FILENAMEFROMAP::a3.txt
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}]
FILENAMEFROMAP::a1.txt
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}]
FILENAMEFROMAP::a2.txt
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}]
所以,我怎麼可以使用地圖fileToWordCount找到文件詞頻?
爲什麼不直接持有'地圖<字符串,請設置>'映射一個字一組文件,它出現在? –
@Itay ..爲什麼不把它作爲答案?這似乎是一個有效的答案。 :) –
@Rohit - 因爲問題是如何使用'fileToWordCount'和我的答案不使用'fileToWordCount' :) –