在開始處理塊之前,您可以在文件中找到位於邊界的偏移量。通過將文件大小除以塊號開始偏移,直到找到一條線邊界。然後將這些偏移量送入您的多線程文件處理器。下面是一個使用可用的處理器數塊數的完整的例子:
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ReadFileByChunks {
public static void main(String[] args) throws IOException {
int chunks = Runtime.getRuntime().availableProcessors();
long[] offsets = new long[chunks];
File file = new File("your.file");
// determine line boundaries for number of chunks
RandomAccessFile raf = new RandomAccessFile(file, "r");
for (int i = 1; i < chunks; i++) {
raf.seek(i * file.length()/chunks);
while (true) {
int read = raf.read();
if (read == '\n' || read == -1) {
break;
}
}
offsets[i] = raf.getFilePointer();
}
raf.close();
// process each chunk using a thread for each one
ExecutorService service = Executors.newFixedThreadPool(chunks);
for (int i = 0; i < chunks; i++) {
long start = offsets[i];
long end = i < chunks - 1 ? offsets[i + 1] : file.length();
service.execute(new FileProcessor(file, start, end));
}
service.shutdown();
}
static class FileProcessor implements Runnable {
private final File file;
private final long start;
private final long end;
public FileProcessor(File file, long start, long end) {
this.file = file;
this.start = start;
this.end = end;
}
public void run() {
try {
RandomAccessFile raf = new RandomAccessFile(file, "r");
raf.seek(start);
while (raf.getFilePointer() < end) {
String line = raf.readLine();
if (line == null) {
continue;
}
// do what you need per line here
System.out.println(line);
}
raf.close();
} catch (IOException e) {
// deal with exception
}
}
}
}
這似乎不太可能確實是在讀多線程一個單一的文件會比單個線程讀取速度更快。磁盤在順序訪問方面非常出色,在隨機訪問方面則較少。如果瓶頸在處理中而不是IO(再次,看起來不太可能),那麼讀取一個線程中的所有數據,並將阻塞移交給要處理的工作線程。我建議你將並行性限制爲一次處理多個文件,每個文件只有一個線程。 – 2011-04-01 10:03:03