多線程集差異的有效方法

我有一組消費線程，每個消費線程都需要一個工作。一旦他們處理完作業，他們就會列出所消耗作業中列出的子作業列表。我需要從列表中添加我在數據庫中沒有的子作業。數據庫中有300萬個，因此獲取那些尚未存在於數據庫中的列表很慢。我不介意每個線程在該調用上阻塞，但由於我有一個競爭條件（請參閱代碼），我必須將它們全部鎖定在慢速調用中，以便他們只能一次調用該部分，並且我的程序會抓取。我可以做些什麼來解決這個問題，以便線程不會爲那個調用減慢速度？我嘗試了一個隊列，但由於線程推出的作業列表比計算機可以確定哪些應該添加到數據庫的速度更快，我最終得到了一個隊列，它不斷增長，從不排空。多線程集差異的有效方法

我的代碼：

IEnumerable<string> getUniqueJobNames(IEnumerable<job> subJobs, int setID) 
{ 
    return subJobs.Select(el => el.name) 
     .Except(db.jobs.Where(el => el.set_ID==setID).Select(el => el.name)); 
} 

//...consumer thread i 
lock(lockObj) 
{ 
    var uniqueJobNames = getUniqueJobNames(consumedJob.subJobs, consumerSetID); 
    //if there was a context switch here to some thread i+1 
    // and that thread found uniqueJobs that also were found in thread i 
    // then there will be multiple copies of the same job added in the database. 
    // So I put this section in a lock to prevent that. 
    saveJobsToDatabase(uniqueJobName, consumerSetID); 
} 
//continue consumer thread i...

來源

2012-03-12 brandon

這我不清楚你正在嘗試待辦事項，您能否再次解釋您正在嘗試的待辦事項，但沒有關於您目前如何做的信息，只是讓實際任務變得更加清晰 – ntziolis 2012-03-12 18:54:29

不能先獲取現有作業的列表，然後編譯列表並行的「新」副作業，最後，保存新的工作？ – 2012-03-12 19:18:11

問題是我不知道哪些是新的，除非我將它們與數據庫比較使用except。我可以編譯出現的所有子作業列表，但是當我最終想要將該列表與數據庫進行比較時，下一個列表出現時就不會完成。無論我以後緩存列表還是立即運行，它們的建立速度都比我可以運行Except方法的速度快。實際上，如果我立即運行它，消費者會跑得更快，問題更加複雜。我猜測有一些數據結構可以提供幫助，或者只是一種不同的算法。 – brandon 2012-03-12 19:21:21

而不是去到數據庫中，以檢查作業名稱的唯一性，你可以在相應的信息爲查找數據結構到內存中，它可以讓你更快地檢查是否存在：

Dictionary<int, HashSet<string>> jobLookup = db.jobs.GroupBy(i => i.set_ID) 
    .ToDictionary(i => i.Key, i => new HashSet<string>(i.Select(i => i.Name)));

這個你只能做一次。此後每次需要檢查唯一一次使用查找：

IEnumerable<string> getUniqueJobNames(IEnumerable<job> subJobs, int setID) 
{ 
    var existingJobs = jobLookup.ContainsKey(setID) ? jobLookup[setID] : new HashSet<string>(); 

    return subJobs.Select(el => el.Name) 
     .Except(existingJobs); 
}

如果您需要輸入一個新的子任務也將它添加到查詢：

lock(lockObj) 
{ 
    var uniqueJobNames = getUniqueJobNames(consumedJob.subJobs, consumerSetID); 
    //if there was a context switch here to some thread i+1 
    // and that thread found uniqueJobs that also were found in thread i 
    // then there will be multiple copies of the same job added in the database. 
    // So I put this section in a lock to prevent that. 
    saveJobsToDatabase(uniqueJobName, consumerSetID); 

    if(!jobLookup.ContainsKey(newconsumerSetID)) 
    { 
     jobLookup.Add(newconsumerSetID, new HashSet<string>(uniqueJobNames)); 
    } 
    else 
    { 
     jobLookup[newconsumerSetID] = new HashSet<string>(jobLookup[newconsumerSetID].Concat(uniqueJobNames))); 
    } 
}

來源

2012-03-12 19:30:22 ntziolis

不錯的解決方案。我寧願使用像這樣的內存，而不是每次都有NlogN查找。我將寫一個這個數據結構的自定義版本，用於將新增加的數據與數據庫同步。 – brandon 2012-03-12 19:36:56

我的建議是不要過多地使數據結構複雜化，單獨處理DB /內存，使調試問題更簡單 – ntziolis 2012-03-12 19:51:41

多線程集差異的有效方法

回答

相關問題