我有一個目錄中的非常大(〜300 MB)文件的列表,需要使用awk腳本多次過濾,每次使用不同的搜索參數。 我已經編寫了一個程序,它使用fixedThreadPool執行程序生成多個線程,並且每個線程內的任務實現都會創建一個新的Runtime()對象,並通過一個使用bash shell執行的新Process來執行awk腳本腳本哪一個更快:從控制檯讀取或寫入文件和閱讀?
下面是一個示例代碼:
類MultiThreadingImpl:
public class MultiThreadingImpl {
static List<File> filesList = new ArrayList<File>();
public static void main(String[] args) {
int numThreads = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool(numThreads);//creating a pool of 5 threads
File logsDir = new File("TestFilesDir");
getLogFiles(logsDir);
String[] searchKeys = {"123456","PAT1"};
for (int i = 0; i < filesList.size() ; i++) {
Runnable worker = new WorkerThread(filesList.get(i),searchKeys[i]);
executor.execute(worker);//calling execute method of ExecutorService
}
executor.shutdown();
while (!executor.isTerminated()) { }
System.out.println("Finished all threads");
}
private static void getLogFiles(File logsDir) {
assert(logsDir.isDirectory());
for(File f : logsDir.listFiles(
new FilenameFilter(){
public boolean accept(File dir, String name) {
return !name.endsWith("_result.txt");
}
}
)){
filesList.add(f);
}
}
}
類的WorkerThread:
class WorkerThread implements Runnable {
private String outputFile;
private String searchKey;
private File logFile;
public WorkerThread(File logFile,String searchKey){
this.logFile = logFile;
this.searchKey = searchKey;
this.outputFile = String.format(logFile.getName().replace(".txt", "") + "_result.txt");
}
public void run() {
int res = 0;
Runtime runtime = Runtime.getRuntime();
String awkRegex = new StringBuilder("'/([0-9]{1}|[0-9]{2})[[:space:]][[:alpha:]]+[[:space:]][0-9]{4}/{n=0}")
.append("/"+searchKey+"/").append("{n=1} n' ").toString();
String awkCommand = new StringBuilder("/usr/bin/awk ").append(awkRegex)
.append(logFile.getAbsolutePath()).append(" &> ").append("/TestFilesDir").append(outputFile).toString();
System.out.println(Thread.currentThread().getName() + ":: Command : " + awkCommand);
String[] cmdList = { "/bin/bash", "-c", awkCommand};
try {
final Process process = runtime.exec(cmdList);
res = process.waitFor();
BufferedReader stdInput = new BufferedReader(new InputStreamReader(process.getInputStream()));
BufferedReader stdError = new BufferedReader(new InputStreamReader(process.getErrorStream()));
while (stdInput.readLine() != null) {
//Emptying stream
}
StringBuffer strerror = new StringBuffer();
String serror = null;
while ((serror = stdError.readLine()) != null) {
strerror.append(serror + "\n");
}
System.out.println(Thread.currentThread().getName() + ":: Process Exit value: " + res);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
這裏我可以選擇寫入每個輸入文件的唯一輸出文件,然後使用cat
合併它們,最後讀取合併的文件。
而且我也可以選擇將每個Process的輸出流的輸出讀入一個字符串併合並所有字符串。
哪種機制更快?
還建議是否有辦法讓整個事情更快?
爲什麼不自己嘗試一下,看看哪個更快? – Cristina