這是我的實現在50毫秒以下搜索。
首先,您必須加載文件並將其保存在內存中。
不管你想不想加載它,但是如果你將它加載到大塊中會更容易。
我的輸入是在byte into python book(下載HTML單文件版本)和Java language specification(下載HTML和創建一個單獨的文件了所有的HTML頁面)
創建列表進入我用一個大文件這個相同的程序(參見注釋代碼)。
一旦我有30萬左右的話大的文件,我跑了此輸出方案:
C:\Users\oreyes\langs\java\search>dir singlelineInput.txt
El volumen de la unidad C no tiene etiqueta.
El número de serie del volumen es: 22A8-203B
Directorio de C:\Users\oreyes\langs\java\search
04/03/2011 09:37 p.m. 3,898,345 singlelineInput.txt
1 archivos 3,898,345 bytes
C:\Users\oreyes\langs\java\search>javac WordSearch.java
C:\Users\oreyes\langs\java\search>java WordSearch singlelineInput.txt "great"
Loaded 377381 words in 2844 ms
true
in 31 ms
C:\Users\oreyes\langs\java\search>java WordSearch singlelineInput.txt "great"
Loaded 377381 words in 2812 ms
true
in 31 ms
C:\Users\oreyes\langs\java\search>java WordSearch singlelineInput.txt "awesome"
Loaded 377381 words in 2813 ms
false
in 47 ms
C:\Users\oreyes\langs\java\search>gvim singlelineInput.txt
C:\Users\oreyes\langs\java\search>java WordSearch singlelineInput.txt "during"
Loaded 377381 words in 2813 ms
true
in 15 ms
C:\Users\oreyes\langs\java\search>java WordSearch singlelineInput.txt "specification"
Loaded 377381 words in 2875 ms
true
in 47 ms
C:\Users\oreyes\langs\java\search>java WordSearch singlelineInput.txt "<href"
Loaded 377381 words in 2844 ms
false
in 47 ms
C:\Users\oreyes\langs\java\search>java WordSearch singlelineInput.txt "<br>"
Loaded 377381 words in 2829 ms
true
in 15 ms
始終低於50毫秒。
下面的代碼:
import java.io.*;
import java.util.*;
class WordSearch {
String inputFile;
List<String> words;
public WordSearch(String file) {
inputFile = file;
}
public void initialize() throws IOException {
long start = System.currentTimeMillis();
File file = new File(inputFile);
ByteArrayOutputStream baos = new ByteArrayOutputStream((int) file.length());
FileInputStream in = new FileInputStream(file);
copyLarge(in, baos, (int)file.length());
Scanner scanner = new Scanner(new ByteArrayInputStream( baos.toByteArray()));
words = new LinkedList<String>();
while(scanner.hasNextLine()) {
String l = scanner.nextLine().trim();
//for(String s : l.split("\\s+")){
//System.out.println(s);
words.add(l.toLowerCase());
//}
}
Collections.sort(words);
for(String s : words) {
//System.out.println(s);
}
System.out.println("Loaded " + words.size() + " words in "+ (System.currentTimeMillis() - start) + " ms" );
}
public boolean contains(String aWord) {
return words.contains(aWord.toLowerCase());
}
// taken from: http://stackoverflow.com/questions/326390/how-to-create-a-java-string-from-the-contents-of-a-file/326413#326413
public static long copyLarge(InputStream input, OutputStream output, int size)
throws IOException {
byte[] buffer = new byte[size];// something biggie
long count = 0;
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}
public static void main(String ... args) throws IOException {
WordSearch ws = new WordSearch(args[0]);
ws.initialize();
long start = System.currentTimeMillis();
System.out.println(ws.contains(args[1]));
System.out.println("in "+ (System.currentTimeMillis() - start) +" ms ");
}
}
難的是得到一個樣本輸入:P
所有編輯器(包括SO的magic textarea)中的空格端口的縮進比製表符要好。 –
有多少個不同的單詞? –
我們在哪裏可以得到一長串單詞?我設法模擬15k,我正在運行一個ms – OscarRyz