將MPI進程映射到特定節點

我認爲這個問題在這裏不重要。但不能幫助自己。假設我有一個有100個節點的集羣，每個節點有16個核心。我有一個mpi應用程序，其通信模式已知，我也知道集羣拓撲結構（即節點之間的跳距）。現在我知道減少網絡爭用的節點映射過程。例如：進程到節點映射是10-> 20,30-> 90。如何將排名爲10的進程映射到node-20？請幫我。將MPI進程映射到特定節點

來源

2013-01-19 Srini

如果您不受任何排隊系統的限制，您可以通過創建自己的machinefile來控制排名到節點的映射。

例如，如果該文件my_machine_file具有以下1600線

node001 
    node002 
    node003 
    .... 
    node100 
    node001 
    node002 
    node003 
    .... 
    node100 
    ... 
    [repeat 13 more times] 
    ... 
    node001 
    node002 
    node003 
    .... 
    node100

，將對應的映射

0-> node001, 1 -> node002, ... 99 -> node100, 100 -> node001, ...

你應該

mpirun -machinefile my_machine_file -n 1600 my_app

運行應用程序時，您的應用程序需要少於1600個進程，你可以編輯你的機器相應的。

請記住，羣集管理員可能編號了與互連拓撲有關的節點。然而，有報道稱，通過仔細開發集羣拓撲結構，性能明顯提高（10％-20％）。（參考以下）。

注意：以mpirun啓動MPI程序既不標準也不便攜。然而，這裏的問題顯然與特定的計算集羣和特定實現（OpenMPI）相關，並且不需要便攜式解決方案。

來源

2013-01-19 07:51:55

感謝您的快速響應。 – Srini

@srini正確。所有內核駐留在同一個節點上，無法用mpirun進行區分。 OS調度程序將進程映射到核心。對核心的處理親和力是[單獨的問題]（http://blogs.cisco.com/performance/open-mpi-v1-5-processor-affinity-options/）。 –

這可能是出於上下文，但事實上，Open MPI允許指定給定節點上每個單獨的級別映射到特定的核心。這是通過將「rankfile」與'-rf'選項一起傳遞給'mpirun'來實現的。 –

有點晚了這個晚會，但這裏的C++中的子程序，會給你一個節點通信和主通信（只爲節點的主人），以及每一個的大小和等級。這很笨拙，但我還沒有找到更好的方法來做到這一點不幸。幸運的是，它只增加了大約0.1秒的時間。也許你或其他人會從中獲得一些用處。

#define MASTER 0 

using namespace std; 

/* 
* Make a comunicator for each node and another for just 
* the masters of the nodes. Upon completion, everyone is 
* in a new node communicator, knows its size and their rank, 
* and the rank of their master in the master communicator, 
* which can be useful to use for indexing. 
*/ 
bool CommByNode(MPI::Intracomm &NodeComm, 
       MPI::Intracomm &MasterComm, 
       int &NodeRank, int &MasterRank, 
       int &NodeSize, int &MasterSize, 
       string &NodeNameStr) 
{ 
    bool IsOk = true; 

    int Rank = MPI::COMM_WORLD.Get_rank(); 
    int Size = MPI::COMM_WORLD.Get_size(); 

    /* 
    * ====================================================================== 
    * What follows is my best attempt at creating a communicator 
    * for each node in a job such that only the cores on that 
    * node are in the node's communicator, and each core groups 
    * itself and the node communicator is made using the Split() function. 
    * The end of this (lengthly) process is indicated by another comment. 
    * ====================================================================== 
    */ 
    char *NodeName, *NodeNameList; 
    NodeName = new char [1000]; 
    int NodeNameLen, 
     *NodeNameCountVect, 
     *NodeNameOffsetVect, 
     NodeNameTotalLen = 0; 
    // Get the name and name character count of each core's node 
    MPI::Get_processor_name(NodeName, NodeNameLen); 

    // Prepare a vector for character counts of node names 
    if (Rank == MASTER) 
     NodeNameCountVect = new int [Size]; 

    // Gather node name lengths to master to prepare c-array 
    MPI::COMM_WORLD.Gather(&NodeNameLen, 1, MPI::INT, NodeNameCountVect, 1, MPI::INT, MASTER); 

    if (Rank == MASTER){ 
     // Need character count information for navigating node name c-array 
     NodeNameOffsetVect = new int [Size]; 
     NodeNameOffsetVect[0] = 0; 
     NodeNameTotalLen = NodeNameCountVect[0]; 

     // build offset vector and total char count for all node names 
     for (int i = 1 ; i < Size ; ++i){ 
      NodeNameOffsetVect[i] = NodeNameCountVect[i-1] + NodeNameOffsetVect[i-1]; 
      NodeNameTotalLen += NodeNameCountVect[i]; 
     } 
     // char-array for all node names 
     NodeNameList = new char [NodeNameTotalLen]; 
    } 

    // Gatherv node names to char-array in master 
    MPI::COMM_WORLD.Gatherv(NodeName, NodeNameLen, MPI::CHAR, NodeNameList, NodeNameCountVect, NodeNameOffsetVect, MPI::CHAR, MASTER); 

    string *FullStrList, *NodeStrList; 
    // Each core keeps its node's name in a str for later comparison 
    stringstream ss; 
    ss << NodeName; 
    ss >> NodeNameStr; 

    delete NodeName; // node name in str, so delete c-array 

    int *NodeListLenVect, NumUniqueNodes = 0, NodeListCharLen = 0; 
    string NodeListStr; 

    if (Rank == MASTER){ 
     /* 
     * Need to prepare a list of all unique node names, so first 
     * need all node names (incl duplicates) as strings, then 
     * can make a list of all unique node names. 
     */ 
     FullStrList = new string [Size]; // full list of node names, each will be checked 
     NodeStrList = new string [Size]; // list of unique node names, used for checking above list 
     // i loops over node names, j loops over characters for each node name. 
     for (int i = 0 ; i < Size ; ++i){ 
      stringstream ss; 
      for (int j = 0 ; j < NodeNameCountVect[i] ; ++j) 
       ss << NodeNameList[NodeNameOffsetVect[i] + j]; // each char into the stringstream 
      ss >> FullStrList[i]; // stringstream into string for each node name 
      ss.str(""); // This and below clear the contents of the stringstream, 
      ss.clear(); // since the >> operator doesn't clear as it extracts 
      //cout << FullStrList[i] << endl; // for testing 
     } 
     delete NodeNameList; // master is done with full c-array 
     bool IsUnique; // flag for breaking from for loop 
     stringstream ss; // used for a full c-array of unique node names 
     for (int i = 0 ; i < Size ; ++i){ // Loop over EVERY name 
      IsUnique = true; 
      for (int j = 0 ; j < NumUniqueNodes ; ++j) 
       if (FullStrList[i].compare(NodeStrList[j]) == 0){ // check against list of uniques 
        IsUnique = false; 
        break; 
       } 
      if (IsUnique){ 
       NodeStrList[NumUniqueNodes] = FullStrList[i]; // add unique names so others can be checked against them 
       ss << NodeStrList[NumUniqueNodes].c_str(); // build up a string of all unique names back-to-back 
       ++NumUniqueNodes; // keep a tally of number of unique nodes 
      } 
     } 
     ss >> NodeListStr; // make a string of all unique node names 
     NodeListCharLen = NodeListStr.size(); // char length of all unique node names 
     NodeListLenVect = new int [NumUniqueNodes]; // list of unique node name lengths 
     /* 
     * Because Bcast simply duplicates the buffer of the Bcaster to all cores, 
     * the buffer needs to be a char* so that the other cores can have a similar 
     * buffer prepared to receive. This wouldn't work if we passed string.c_str() 
     * as the buffer, becuase the receiving cores don't have string.c_str() to 
     * receive into, and even if they did, c_srt() is a method and can't be used 
     * that way. 
     */ 
     NodeNameList = new char [NodeListCharLen]; // even though c_str is used, allocate necessary memory 
     NodeNameList = const_cast<char*>(NodeListStr.c_str()); // c_str() returns const char*, so need to recast 
     for (int i = 0 ; i < NumUniqueNodes ; ++i) // fill list of unique node name char lengths 
      NodeListLenVect[i] = NodeStrList[i].size(); 
     /*for (int i = 0 ; i < NumUnique ; ++i) 
      cout << UniqueNodeStrList[i] << endl; 
     MPI::COMM_WORLD.Abort(1);*/ 
     //delete NodeStrList; // Arrays of string don't need to be deallocated, 
     //delete FullStrList; // I'm guessing becuase of something weird in the string class. 
     delete NodeNameCountVect; 
     delete NodeNameOffsetVect; 
    } 
    /* 
    * Now we send the list of node names back to all cores 
    * so they can group themselves appropriately. 
    */ 

    // Bcast the number of nodes in use 
    MPI::COMM_WORLD.Bcast(&NumUniqueNodes, 1, MPI::INT, MASTER); 
    // Bcast the full length of all node names 
    MPI::COMM_WORLD.Bcast(&NodeListCharLen, 1, MPI::INT, MASTER); 

    // prepare buffers for node name Bcast's 
    if (Rank > MASTER){ 
     NodeListLenVect = new int [NumUniqueNodes]; 
     NodeNameList = new char [NodeListCharLen]; 
    } 

    // Lengths of node names for navigating c-string 
    MPI::COMM_WORLD.Bcast(NodeListLenVect, NumUniqueNodes, MPI::INT, MASTER); 
    // The actual full list of unique node names 
    MPI::COMM_WORLD.Bcast(NodeNameList, NodeListCharLen, MPI::CHAR, MASTER); 

    /* 
    * Similar to what master did before, each core (incl master) 
    * needs to build an actual list of node names as strings so they 
    * can compare the c++ way. 
    */ 
    int Offset = 0; 
    NodeStrList = new string[NumUniqueNodes]; 
    for (int i = 0 ; i < NumUniqueNodes ; ++i){ 
     stringstream ss; 
     for (int j = 0 ; j < NodeListLenVect[i] ; ++j) 
      ss << NodeNameList[Offset + j]; 
     ss >> NodeStrList[i]; 
     ss.str(""); 
     ss.clear(); 
     Offset += NodeListLenVect[i]; 
     //cout << FullStrList[i] << endl; 
    } 
    // Now since everyone has the same list, just check your node and find your group. 
    int CommGroup = -1; 
    for (int i = 0 ; i < NumUniqueNodes ; ++i) 
     if (NodeNameStr.compare(NodeStrList[i]) == 0){ 
      CommGroup = i; 
      break; 
     } 
    if (Rank > MASTER){ 
     delete NodeListLenVect; 
     delete NodeNameList; 
    } 
    // In case process fails, error prints and job aborts. 
    if (CommGroup < 0){ 
     cout << "**ERROR** Rank " << Rank << " didn't identify comm group correctly." << endl; 
     IsOk = false; 
    } 

    /* 
    * ====================================================================== 
    * The above method uses c++ strings wherever possible so that things 
    * like node name comparisons can be done the c++ way. I'm sure there's 
    * a better way to do this because that was way too many lines of code... 
    * ====================================================================== 
    */ 

    // Create node communicators 
    NodeComm = MPI::COMM_WORLD.Split(CommGroup, 0); 
    NodeSize = NodeComm.Get_size(); 
    NodeRank = NodeComm.Get_rank(); 

    // Group for master communicator 
    int MasterGroup; 
    if (NodeRank == MASTER) 
     MasterGroup = 0; 
    else 
     MasterGroup = MPI_UNDEFINED; 

    // Create master communicator 
    MasterComm = MPI::COMM_WORLD.Split(MasterGroup, 0); 
    MasterRank = -1; 
    MasterSize = -1; 
    if (MasterComm != MPI::COMM_NULL){ 
     MasterRank = MasterComm.Get_rank(); 
     MasterSize = MasterComm.Get_size(); 
    } 

    MPI::COMM_WORLD.Bcast(&MasterSize, 1, MPI::INT, MASTER); 
    NodeComm.Bcast(&MasterRank, 1, MPI::INT, MASTER); 

    return IsOk; 
}

來源

2013-12-12 20:34:50 twilsonco

將MPI進程映射到特定節點

回答

相關問題