多線程速度問題

我在代碼中添加了多線程部分。多線程速度問題

public class ThreadClassSeqGroups 
    { 
     public Dictionary<string, string> seqGroup; 
     public Dictionary<string, List<SearchAlgorithm.CandidateStr>> completeModels; 
     public Dictionary<string, List<SearchAlgorithm.CandidateStr>> partialModels; 
     private Thread nativeThread; 

     public ThreadClassSeqGroups(Dictionary<string, string> seqs) 
     { 
      seqGroup = seqs; 
      completeModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>(); 
      partialModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>(); 
     } 

     public void Run(DescrStrDetail dsd, DescrStrDetail.SortUnit primarySeedSu, 
      List<ushort> secondarySeedOrder, double partialCutoff) 
     { 
      nativeThread = new Thread(() => this._run(dsd, primarySeedSu, secondarySeedOrder, partialCutoff)); 
      nativeThread.Priority = ThreadPriority.Highest; 
      nativeThread.Start(); 
     } 

     public void _run(DescrStrDetail dsd, DescrStrDetail.SortUnit primarySeedSu, 
      List<ushort> secondarySeedOrder, double partialCutoff) 
     { 
      int groupSize = this.seqGroup.Count; 
      int seqCount = 0; 
      foreach (KeyValuePair<string, string> p in seqGroup) 
      { 
       Console.WriteLine("ThreadID {0} (priority:{1}):\t#{2}/{3} SeqName: {4}", 
        nativeThread.ManagedThreadId, nativeThread.Priority.ToString(), ++seqCount, groupSize, p.Key); 
       List<SearchAlgorithm.CandidateStr> tmpCompleteModels, tmpPartialModels; 
       SearchAlgorithm.SearchInBothDirections(
         p.Value.ToUpper().Replace('T', 'U'), dsd, primarySeedSu, secondarySeedOrder, partialCutoff, 
         out tmpCompleteModels, out tmpPartialModels); 
       completeModels.Add(p.Key, tmpCompleteModels); 
       partialModels.Add(p.Key, tmpPartialModels); 
      } 
     } 

     public void Join() 
     { 
      nativeThread.Join(); 
     } 

    } 

class Program 
{ 
    public static int _paramSeqGroupSize = 2000; 
    static void Main(Dictionary<string, string> rawSeqs) 
    { 
     // Split the whole rawSeqs (Dict<name, seq>) into several groups 
     Dictionary<string, string>[] rawSeqGroups = SplitSeqFasta(rawSeqs, _paramSeqGroupSize); 


     // Create a thread for each seqGroup and run 
     var threadSeqGroups = new MultiThreading.ThreadClassSeqGroups[rawSeqGroups.Length]; 
     for (int i = 0; i < rawSeqGroups.Length; i++) 
     { 
      threadSeqGroups[i] = new MultiThreading.ThreadClassSeqGroups(rawSeqGroups[i]); 
      //threadSeqGroups[i].SetPriority(); 
      threadSeqGroups[i].Run(dsd, primarySeedSu, secondarySeedOrder, _paramPartialCutoff); 
     } 

     // Merge results from threads after the thread finish 
     var allCompleteModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>(); 
     var allPartialModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>(); 
     foreach (MultiThreading.ThreadClassSeqGroups t in threadSeqGroups) 
     { 
      t.Join(); 
      foreach (string name in t.completeModels.Keys) 
      { 
       allCompleteModels.Add(name, t.completeModels[name]); 
      } 
      foreach (string name in t.partialModels.Keys) 
      { 
       allPartialModels.Add(name, t.partialModels[name]); 
      } 
     } 
    } 
}

但是，多線程的速度比單線程要慢得多，CPU的負載一般是10％。

例如：

輸入文件包含2500串

_paramGroupSize = 3000，主線程+ 1個計算線程花費200秒

_paramGroupSize = 400，主線程+ 7計算線程花費不多更多的時間（我在超過10分鐘後殺死它）。

我的執行有問題嗎？如何加快速度？

謝謝。

來源

2012-07-27 Mavershang

SearchAlgorithm.SearchInBothDirections是做什麼的？ – 2012-07-27 15:24:31

使用類似DotTrace的分析器，它會告訴你時間消耗在哪裏。 – 2012-07-27 15:26:17

@ Bryan：SearchAlgorithm.SearchInBothDirections正在對給定字符串進行深入搜索，返回兩個候選清單列表作爲輸出參數 – Mavershang 2012-07-27 15:29:30

多線程之前的代碼是什麼？很難說這個代碼在做什麼，很多「工作」代碼似乎隱藏在搜索算法中。然而，一些想法：

你提到的「輸入文件」，但是這並沒有明確用代碼顯示 - 如果你的文件訪問被線程，這不會提高性能的文件訪問將成爲瓶頸。
創建比您擁有的線程多的CPU內核將最終降低性能（除非每個線程都被阻塞等待不同的資源）。在你的情況下，我會建議總共8個線程太多。
似乎很多數據（內存）訪問可能通過您的類DescrStrDetail完成，該類從您的Main方法中的變量dsd傳遞給每個子線程。然而，這個變量的聲明丟失了，所以它的使用/實現是未知的。如果此變量具有阻止多個線程同時訪問的鎖定，那麼您的多個線程可能會將這些數據鎖定在其他位置，從而進一步降低性能。

來源

2012-07-27 15:38:13

當線程運行時，它們在特定的處理器上被賦予時間。如果線程數量多於處理器，則系統上下文會在線程之間切換，以便在一段時間內處理所有活動的線程。上下文切換是真的很貴。如果線程數多於處理器數，大部分CPU時間可以通過上下文切換佔用，並且使單線程解決方案比多線程解決方案更快速地看。

你的例子顯示開始不確定的線程數。如果SplitSeqFasta返回的內容比內核更多，則將創建更多線程和內核，並引入大量上下文切換。

我建議你手動節制線程的數量，或者使用線程並行庫和Parallel類之類的東西來讓它自動爲你加油。

來源

2012-07-27 16:00:44

在我看來，你正在嘗試與多個線程並行處理文件。假設你有一個機械磁盤，這是一個壞主意。

基本上，磁盤頭部需要爲每個讀取請求尋找下一個讀取位置。這是一個代價高昂的操作，並且由於多個線程發出讀取命令，這意味着在每個線程輪到它運行時頭部會被反彈。與單線程讀取數據的情況相比，這會顯着降低性能。

來源

2012-07-27 18:30:05 Tudor

多線程速度問題

回答

相關問題