與多對多處理器通信時發生MPI錯誤

我正在編寫一個代碼，其中每個處理器必須與多個處理器交互。與多對多處理器通信時發生MPI錯誤

例如：我有12個處理器，所以處理器0必須通信說1,2,10和9.讓我們稱它們爲處理器0的鄰居。同樣我有處理器1必須通過說5，3 。處理器2必須與5,1,0,10,11 等通信。數據流是2種方式，即處理器0必須發送數據到1,2,10和9，並且也從它們接收數據。另外，標籤計算中沒有問題。我創建了是這樣的代碼：

for(all neighbours) 
{ 
store data in vector<double> x; 

    MPI_Send(x) 
} 
MPI_BARRIER(); 
for(all neighbours) 
{ 
MPI_Recv(x); 
do work with x 
}

現在我測試這個算法x的不同大小和鄰居不同的安排。該代碼適用於某些人，但不適用於其他人，它只是訴諸於僵局。我也曾嘗試：

for(all neighbours) 
{ 
store data in vector<double> x; 

    MPI_ISend(x) 
} 
MPI_Test(); 
for(all neighbours) 
{ 
MPI_Recv(x); 
do work with x 
}

結果是一樣的，但僵局被楠結果replcaed，爲MPI_TEST（）告訴我，一些MPI_Isend（）操作的是不完整的，並立即跳轉到MPI_RECV（）。

任何人都可以在這個問題上指導我，我東錯了什麼？或者，我的基本方法本身是不正確的？

編輯：我附上代碼片斷，以便更好地理解問題。我基本上在並行化一個非結構化的3D-CFD求解器

我附加了一個文件，並附有一些解釋。我沒有廣播，我正在遍歷父處理器的鄰居以通過接口發送數據（這可以被定義爲兩個接口之間的邊界）。

所以，如果說我有12個處理器，並說處理器0必須溝通說1,2,10和9.所以0是父處理器和1,2,10和9是它的鄰居。由於文件太長而且是求解器的一部分，爲了簡單起見，我只保留了MPI函數。

void Reader::MPI_InitializeInterface_Values() { 
double nbr_interface_id; 
Interface *interface; 
MPI_Status status; 
MPI_Request send_request, recv_request; 
int err, flag; 
int err2; 
char buffer[MPI_MAX_ERROR_STRING]; 
int len; 
int count; 


for (int zone_no = 0; zone_no<this->GetNumberOfZones(); zone_no++) { // Number of zone per processor is 1, so basically each zone is an independent processor 
    UnstructuredGrid *zone = this->ZoneList[zone_no]; 
    int no_of_interface = zone->GetNumberOfInterfaces(); 
    // int count; 
    long int count_send = 0; 
    long int count_recv = 0; 
    long int max_size = 10000; // can be set from test case later 
    int max_size2 = 199; 

    int proc_no = FlowSolution::processor_number; 
    for (int interface_no = 0; interface_no < no_of_interface; interface_no++) { // interface is defined as a boundary between two zones 


     interface = zone->GetInterface(interface_no); 
     int no_faces = interface->GetNumberOfFaces(); 
     if (no_faces != 0) { 

      std::vector<double> Variable_send; // The vector which stores the data to be sent across the interface 
      std::vector<double> Variable_recieve; 
      int total_size = FlowSolution::VariableOrder.size() * no_faces; 
      Variable_send.resize(total_size); 
      Variable_recieve.resize(total_size); 
      int nbr_proc_no = zone->GetInterface(interface_no)->GetNeighborZoneId(); // neighbour of parent processor 

       int j = 0; 
       nbr_interface_id = interface->GetShared_Interface_ID(); 

       for (std::map<VARIABLE, int>::iterator iterator = FlowSolution::VariableOrder.begin(); iterator != FlowSolution::VariableOrder.end(); iterator++) { 

        for (int face_no = 0; face_no < no_faces; face_no++) { 
         Face *face = interface->GetFace(face_no); 
         int owner_id = face->Getinterface_Original_face_owner_id(); 
         double value_send = zone->GetInterface(interface_no)->GetFace(face_no)->GetCell(owner_id)->GetPresentFlowSolution()->GetVariableValue((*iterator).first); 
         Variable_send[j] = value_send; 
         j++; 
        } 
       } 
       count_send = nbr_proc_no * max_size + nbr_interface_id; // tag for data to be sent 
       err2 = MPI_Isend(&Variable_send.front(), total_size, MPI_DOUBLE, nbr_proc_no, count_send, MPI_COMM_WORLD, &send_request); 
     }// end of sending 

    } // all the processors have sent data to their corresponding neighbours 

    MPI_Barrier(MPI_COMM_WORLD); 

    for (int interface_no = 0; interface_no < no_of_interface; interface_no++) { // loop over of neighbours of the current processor to receive data 

     interface = zone->GetInterface(interface_no); 
     int no_faces = interface->GetNumberOfFaces(); 
     if (no_faces != 0) { 
      std::vector<double> Variable_recieve; // The vector which collects the data sent across the interface from 
      int total_size = FlowSolution::VariableOrder.size() * no_faces; 
      Variable_recieve.resize(total_size); 
      count_recv = proc_no * max_size + interface_no; // tag to receive data 
      int nbr_proc_no = zone->GetInterface(interface_no)->GetNeighborZoneId(); 
      nbr_interface_id = interface->GetShared_Interface_ID(); 
       MPI_Irecv(&Variable_recieve.front(), total_size, MPI_DOUBLE, nbr_proc_no, count_recv, MPI_COMM_WORLD, &recv_request); 

       /* Now some work is done using received data */ 
       int j = 0; 
       for (std::map<VARIABLE, int>::iterator iterator = FlowSolution::VariableOrder.begin(); iterator != FlowSolution::VariableOrder.end(); iterator++) { 
        for (int face_no = 0; face_no < no_faces; face_no++) { 
         double value_recieve = Variable_recieve[j]; 
         j++; 
         Face *face = interface->GetFace(face_no); 
         int owner_id = face->Getinterface_Original_face_owner_id(); 
         interface->GetFictitiousCell(face_no)->GetPresentFlowSolution()->SetVariableValue((*iterator).first, value_recieve); 
         double value1 = face->GetCell(owner_id)->GetPresentFlowSolution()->GetVariableValue((*iterator).first); 
         double face_value = 0.5 * (value1 + value_recieve); 
         interface->GetFace(face_no)->GetPresentFlowSolution()->SetVariableValue((*iterator).first, face_value); 
        } 
       } 
       // Variable_recieve.clear(); 

     } 

    }// end of receiving

}

來源

2017-03-13 samurai_01

這也可以被稱爲重疊進程之間的MPI同時通信的情況 –

第一種方法不能工作，因爲MPI_Send被阻塞，所以你的進程都不能進入Recv調用，因爲它們將被掛在MPI_Send上，或者如果它們沒有調用MPI_Send，則在MPI_Barrier上。第二種方法可以工作，但是你可以在實際代碼的片段中編輯嗎？您可能會在數據緩衝區中做錯某些事情，但從僞代碼中不清楚。 – timdykes

鑑於第二種情況使用不同的緩衝區，並且使用時間足夠長，您仍然缺少MPI_Wait（全部）。很顯然，MPI_Isend不會立即完成，當您通過此請求調用MPI_Wait（all）時，MPI_Isend會最後完成。所以MPI_Test可能會也可能不會成功。我建議你閱讀文檔，然後你不瞭解。 – overseas

通過使用AllGatherV，可以解決問題。我所做的只是發送計數，以便發送計數只有我想與之通信的處理器。其他處理器有0個發送計數。這解決了我的問題

謝謝大家的回答！

來源

2017-05-17 07:42:34

從問題陳述工作：

處理器0具有要發送到1，2，圖9和10，並從它們接收。
處理器1必須發送到5和3，並從它們接收。
處理器2必須發送到0,1,5,10和11，並從它們接收。
總共有12個處理器。

可以使生活更輕鬆，如果你只是運行一個12步計劃：

第1步：處理器0發送，接受別人需要，然後反過來發生。
步驟2：處理器1發送，其他人根據需要接收，然後發生相反的情況。
...
第12步：利潤 - 沒有什麼可以做的了（因爲每個其他處理器已經與處理器11交互）。

每一步都可以實現爲MPI_Scatterv（一些sendcounts將爲零），後面是MPI_Gatherv。總共22個電話，你就完成了。

來源

2017-03-13 12:17:58

感謝您的回覆。我認爲你的方式是行得通的，如果相同的數據必須在父母和鄰居之間轉移，但我手邊有一個不同的問題。你可以看到我編輯的片段。 –

@ samurai_01：不確定你的意思是「相同的數據」。 –

可能有幾個可能的死鎖原因，所以你必須更具體，例如， G。標準說：「當使用標準發送操作時，那麼由於緩衝區空間不可用而導致兩個進程都被阻塞時，可能會發生死鎖情況。」

您應該同時使用Isend和Irecv。一般結構應該是：

MPI_Request req[n]; 

MPI_Irecv(..., req[0]); 
// ... 
MPI_Irecv(..., req[n-1]); 
MPI_Isend(..., req[0]); 
// ... 
MPI_Isend(..., req[n-1]); 

MPI_Waitall(n, req, MPI_STATUSES_IGNORE);

來源

2017-03-13 15:54:44 mcsim

感謝您的回覆，如果您看到我附加的代碼段，我必須將數據從處理器傳輸到其鄰居。我沒有嘗試MPI_Waitall（），但會嘗試這個並讓你知道。 –

與多對多處理器通信時發生MPI錯誤

回答

相關問題