Java的並行文件處理

我有以下代碼：Java的並行文件處理

import java.io.*; 
import java.util.concurrent.* ; 
public class Example{ 
public static void main(String args[]) { 
    try { 
     FileOutputStream fos = new FileOutputStream("1.dat"); 
     DataOutputStream dos = new DataOutputStream(fos); 

     for (int i = 0; i < 200000; i++) { 
      dos.writeInt(i); 
     } 
     dos.close();               // Two sample files created 

     FileOutputStream fos1 = new FileOutputStream("2.dat"); 
     DataOutputStream dos1 = new DataOutputStream(fos1); 

     for (int i = 200000; i < 400000; i++) { 
      dos1.writeInt(i); 
     } 
     dos1.close(); 

     Exampless.createArray(200000); //Create a shared array 
     Exampless ex1 = new Exampless("1.dat"); 
     Exampless ex2 = new Exampless("2.dat"); 
     ExecutorService executor = Executors.newFixedThreadPool(2); //Exexuted parallaly to cont number of matches in two file 
     long startTime = System.nanoTime(); 
     long endTime; 
     Future<Integer> future1 = executor.submit(ex1); 
     Future<Integer> future2 = executor.submit(ex2); 
     int count1 = future1.get(); 
     int count2 = future2.get(); 
     endTime = System.nanoTime(); 
     long duration = endTime - startTime; 
     System.out.println("duration with threads:"+duration); 
     executor.shutdown(); 
     System.out.println("Matches: " + (count1 + count2)); 

     startTime = System.nanoTime(); 
     ex1.call(); 
     ex2.call(); 
     endTime = System.nanoTime(); 
     duration = endTime - startTime; 
     System.out.println("duration without threads:"+duration); 

    } catch (Exception e) { 
     System.err.println("Error: " + e.getMessage()); 
    } 
} 
} 

class Exampless implements Callable { 

public static int[] arr = new int[20000]; 
public String _name; 

public Exampless(String name) { 
    this._name = name; 
} 

static void createArray(int z) { 
    for (int i = z; i < z + 20000; i++) { //shared array 
     arr[i - z] = i; 
    } 
} 

public Object call() { 
    try { 
     int cnt = 0; 
     FileInputStream fin = new FileInputStream(_name); 
     DataInputStream din = new DataInputStream(fin);  // read file and calculate number of matches 
     for (int i = 0; i < 20000; i++) { 
      int c = din.readInt(); 
      if (c == arr[i]) { 
       cnt++; 
      } 
     } 
     return cnt ; 
    } catch (Exception e) { 
     System.err.println("Error: " + e.getMessage()); 
    } 
    return -1 ; 
} 

}

當我試圖用兩個文件來計算陣列中的匹配數量。現在，雖然我在兩個線程上運行它，但代碼並不完善，因爲：

（在單線程上運行它，文件1 +文件2讀取時間）<（文件1 ||文件2在多線程中讀取時間）。

任何人都可以幫助我如何解決這個問題（我有2核心CPU和文件大小約爲1.5 GB）。

來源

2012-07-31 Arpssss

@SurajChandran，大部分時間。真正沒有效果。:)只是運行測試。 – Arpssss 2012-07-31 16:33:24

文件不是1.5GB，只有~80K。 – 2012-07-31 16:33:42

@KeithRandall，我只是舉例說明。 – Arpssss 2012-07-31 16:36:29

在第一種情況下，您按順序逐個讀取一個文件，逐字節讀取。這與磁盤I/O的速度一樣快，只要文件不是很分散。當你完成第一個文件時，磁盤/操作系統找到第二個文件的開始，並繼續非常高效地讀取磁盤。

在第二種情況下，您經常在第一個和第二個文件之間切換，迫使磁盤從一個地方到另一個地方。這額外的尋找時間（約10毫秒）是你的困惑的根源。

哦，你知道磁盤訪問是單線程的，你的任務是I/O綁定的，所以沒有辦法將這個任務分割到多個線程可以提供幫助，只要你從同一個物理磁盤讀取數據？你的方法只能是合理的，如果：

每個線程，除了從文件中讀取，也被執行一些CPU密集型或相對於I/O通過一個數量級阻塞操作，速度較慢。
文件在不同物理驅動器（不同分區是不夠的），或者在某些RAID配置
您使用的SSD驅動器

來源

2012-07-31 16:32:59

+1。這是許多人不瞭解的一個基本問題：只有增加限制試劑才能提高性能。 – RedGreasel 2012-07-31 16:53:50

你不會得到多線程任何好處正如Tomasz從閱讀磁盤數據中指出的那樣。如果您多線程化檢查，即可以將文件中的數據順序加載到數組中，然後線程並行執行檢查，則可能會提高速度。但考慮到你的文件的小尺寸（〜80kb）以及你只是比較整數的事實，我懷疑性能的提高是值得的。

如果你不使用readInt（），那麼肯定會提高執行速度的東西。既然你知道你在比較20000個整數，你應該爲每個文件（或者至少在塊中）讀取所有20000個整型數組，而不是調用readInt（）函數20000次。

來源

2012-07-31 16:54:20 onit

Java的並行文件處理

回答

相關問題