在兩個不同的GPU

我具有類似於下面的一個PyTorch腳本運行在平行的Python代碼部在兩個不同的終端運行下面的命令：在兩個不同的GPU

CUDA_VISIBLE_DEVICES=0 python program.py --method method1 
CUDA_VISIBLE_DEVICES=1 python program.py --method method2

的問題是，上述的數據加載器功能包含一些隨機性在裏面，這意味着兩個方法分別應用於兩個不同的訓練數據集。我想他們訓練完全相同的一組數據，所以我修改了腳本如下：

# Loading data 
train_loader, test_loader = someDataLoaderFunction() 

# Define the architecture 
model = ResNet18() 
model = model.cuda() 

## Run for the first method 
method = 'method1' 

# Training 
train(method, model, train_loader, test_loader) 

## Run for the second method 
method = 'method2' 

# Must re-initialize the network first 
model = ResNet18() 
model = model.cuda() 

# Training 
train(method, model, train_loader, test_loader)

是否有可能使其在每個方法並行地運行？非常感謝您的幫助！

來源

2017-09-27 Khue

恩，平行計算完全需要不同的編碼架構，你以前做過什麼嗎？我所能做的至少是指向Python 3中的'queue'內建庫，您必須使用它來編排並行執行。也請閱讀關於比賽條件和線程鎖定，否則你可能最終在編碼沮喪 – aim100k

@ aim100k謝謝。我只是做了一些基本的東西，比如C++或Matlab中的並行循環：（ – Khue

）我看到了你的網站，我認爲你所做的真的很棒，我也喜歡這些主題，但不能承受那麼多的教育。你在這裏找到答案 – aim100k

我想最簡單的方法是修復種子如下。

myseed=args.seed 
np.random.seed(myseed) 
torch.manual_seed(myseed) 
torch.cuda.manual_seed(myseed)

這應該強制數據加載器每次都得到相同的樣本。平行的方式是使用多線程，但我幾乎看不出你發佈的問題的麻煩。

來源

2017-09-28 09:45:05

謝謝，我在這裏得到了同樣的答案：https：//discuss.pytorch.org/t/how-to-run-two-training-methods-in-parallel-on-exactly-the-same-data/7796/ 2 – Khue

在兩個不同的GPU

回答

相關問題