2012-05-17 43 views
1

我很難過。 all_gather適用於基元(例如int),但即使對於簡單的STL容器也是如此。 valgrind稱該容器未被分配/初始化,但這看起來不正確。C++ boost MPI&線程 - 序列化錯誤:地址未映射

總結:

  • 我做一些多線程使用OpenMP,然後重新加入線程。
  • 在串行中,我嘗試使用`boost :: mpi :: all_gather來簡化的all_gather。 MPI的排名是而不是的線程。 (有2個MPI等級,每個MPI等級有4個線程)。
  • 然後我打算做一些更多(孤立的)多線程。

看起來很直白......這裏可能會發生什麼?

的main.cpp

#include <openmpi/mpi.h> 
#include <omp.h> 
#include <boost/mpi.hpp>  
#include "globals.h" 

int main(int argc, char* argv[]) 
{   

    int provided_MPI; 
    MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided_MPI); 

    boost::mpi::environment my_boost_mpi_env(argc, argv); 
    boost::mpi::communicator world_MPI_boost;   
    world_MPI_boost_ptr = &world_MPI_boost; 
     // ^^^ global variable of type boost::mpi::communicator * 

    perform_complete_variable_elimination_schedule(); 
    //... 

} 

Conn_Comp.cpp

#include <boost/mpi.hpp>  
#include <boost/mpi/collectives.hpp> 
#include <boost/serialization/serialization.hpp> 
#include <boost/serialization/vector.hpp> 
#include <boost/serialization/map.hpp> 

#include "globals.h" 

... 

void perform_complete_variable_elimination_schedule() 
{ 

    // isolated work in parallel using OpenMP 
    #pragma omp parallel 
    { 
    //work 
    }  

    // SERIAL REGION (with respect to threading). 

    std::map<uint,uint> my_map; 
    std::vector< std::map<uint,uint> > vec_of_my_maps; 

    boost::mpi::all_gather< std::map<uint,uint> > 
        (*world_MPI_boost_ptr, 
         my_map, 
         vec_of_my_maps); // <--- line 293 (referenced by valgrind) 


    // more isolated work in parallel using OpenMP 
    #pragma omp parallel 
    { 
    //work 
    } 

} 

的valgrind抱怨mapvector的結果無效讀取。但是這個vector是在調用all_gather之前立即創建的 - 所以它顯然在範圍內,而不是在並行線程區域。 選擇Valgrind的錯誤輸出:

==12665== Use of uninitialised value of size 4 
==12665== at 0x41C8D7A: boost::archive::detail::basic_iarchive::get_library_version() const (basic_iarchive.cpp:575) 
==12665== by 0x41C92C6: boost::archive::detail::basic_iarchive::load_object(void*, boost::archive::detail::basic_iserializer const&) (basic_iarchive.cpp:399) 
==12665== by 0x80F5696: void boost::mpi::all_gather<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > >(boost::mpi::communicator const&, std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > const&, std::vector<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >, std::allocator<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > > >&) (iserializer.hpp:387) 
==12665== by 0x80DEC83: Conn_Comp::perform_complete_variable_elimination_schedule() (Conn_Comp.cpp:**293**) 
==12665== by 0x80C840A: main (main.cpp:695) 
==12665== 
==12665== Invalid read of size 2 
==12665== at 0x41C8D7A: boost::archive::detail::basic_iarchive::get_library_version() const (basic_iarchive.cpp:575) 
==12665== by 0x41C92C6: boost::archive::detail::basic_iarchive::load_object(void*, boost::archive::detail::basic_iserializer const&) (basic_iarchive.cpp:399) 
==12665== by 0x80F5696: void boost::mpi::all_gather<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > >(boost::mpi::communicator const&, std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > const&, std::vector<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >, std::allocator<std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > > > >&) (iserializer.hpp:387) 
==12665== by 0x80DEC83: Conn_Comp::perform_complete_variable_elimination_schedule() (main.cpp:**293**) 
==12665== by 0x80C840A: main (main.cpp:695) 
==12665== Address 0x3580bece is not stack'd, malloc'd or (recently) free'd 
==12665== 
[drosphila:12665] *** Process received signal *** 
[drosphila:12665] Signal: Segmentation fault (11) 
[drosphila:12665] Signal code: Address not mapped (1) 
[drosphila:12665] Failing at address: 0x3580bece 
[drosphila:12665] [ 0] /lib/i686/cmov/libpthread.so.0(+0xe500) [0x44f8500] 
[drosphila:12665] [ 1] /usr/lib/libboost_serialization.so.1.42.0(_ZN5boost7archive6detail14basic_iarchive11load_objectEPvRKNS1_17basic_iserializerE+0x1b7) [0x41c92c7] 
[drosphila:12665] [ 2] ./detect_NAHR(_ZN5boost3mpi10all_gatherISt3mapIjjSt4lessIjESaISt4pairIKjjEEEEEvRKNS0_12communicatorERKT_RSt6vectorISD_SaISD_EE+0x587) [0x80f5697] 
[drosphila:12665] [ 3] ./detect_NAHR(_ZN9Conn_Comp46perform_complete_variable_elimination_scheduleEv+0x534) [0x80dec84] 
[drosphila:12665] [ 4] ./detect_NAHR(main+0xf5b) [0x80c840b] 
[drosphila:12665] [ 5] /lib/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x4519ca6] 
[drosphila:12665] [ 6] ./detect_NAHR() [0x80c73e1] 
[drosphila:12665] *** End of error message *** 

我用MPI_Init_thread基於從升壓help page的建議。

正如我在頂部所說,如果我使用原語(即只是uint)而不是地圖,那麼all_gather工作正常。爲什麼地圖會失敗? boost serialize已經序列化STL容器的方法,所以這不是問題......

還要注意的是,將持有的所有值的向量在all_gather自動調整(我查all_gather實施)要大足以容納一切。不管,即使我自己初始化它,它仍然失敗。

最後,即使我使用普通的舊數組(例如正確分配),例如std::map<uint,uint> *,我遇到同樣的問題。

+1

我希望'boost :: mpi'是建立在C MPI API上的,而不是C++中的,它被棄用並將從MPI v3.0中刪除。 –

回答

2

嗯,這很尷尬。 如果其他人有同樣的奇怪錯誤,我會留下問題。

我的代碼的問題實際上是在makefile中。 我忘了鏈接到MPI的boost庫。

不正確的makefile標誌:

-I$(BOOST_INCLUDE)  -lboost_serialization -lboost_mpi 

顯然,該行只包含足夠的信息,以允許該程序編譯和運行,但是在運行時錯誤的結果。

正確的makefile標誌:

-L$(BOOST_LIB) -ldl -Wl,-rpath,$(BOOST_LIB) -lboost_serialization -lboost_mpi 

(注意加鏈接標記庫的)。

+0

如何在LIBRARY_PATH環境中設置boost庫?你能解釋一下每個標誌的功能嗎? – William