2011-11-05 36 views
1

INTRO

我在共享庫(DLL.so)的形式具有TCP/HTTP server that supports plugins。它通過premakemake.sln文件構建系統。當我開始我的應用程序時,我向它提供一個像這樣的配置文件,其中描述了庫服務器應該用作插件以及它將傳遞給tham的參數。有一段時間我有2個插件,都工作得很好。甚至現在工作得很好,如果我餵我的服務器配置文件一樣this。但是現在我有新的插件正在開發,所以new config fileLinux上的不可見的SIGSEGV不會在Windows上發生?

SETUP

步驟需要設置我的Linux服務器是fiew簡單

  • 下載構建腳本(從here描述here
  • ./cloud_server_net_setup.sh,沒有超級用戶需要,需要捲曲,使和g ++ 在常規情況下(不是開發,這是足夠的 - 它會得到提升,並且它需要的其他庫到本地文件夾中,它將以發佈形式構建所有tham,構建服務器)
  • 現在你可以cd到cloud_server/install-dir/
  • 呼叫export LD_LIBRARY_PATH=./:./lib_boost
  • 並運行我們的服務器./CloudServer

但我們需要調試wersion所以以後我們稱之爲腳本中,我們

  • cd cloud_server/CloudServer/projects/linux-gmake/
  • make
  • cd bin/debug
  • export LD_LIBRARY_PATH=./:(place from where we called our script)/cloud_server/install-dir/lib_boost

問題

  • 現在,我們終於可以調用GDB。

所以我們這樣稱呼它。這就是我們看到:

gdb ./CloudServer 

GNU gdb (GDB) 7.0.1-debian 
Copyright (C) 2009 Free Software Foundation, Inc. 
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> 
This is free software: you are free to change and redistribute it. 
There is NO WARRANTY, to the extent permitted by law. Type "show copying" 
and "show warranty" for details. 
This GDB was configured as "x86_64-linux-gnu". 
For bug reporting instructions, please see: 
<http://www.gnu.org/software/gdb/bugs/>... 
Reading symbols from /home/ole_jak/cloud_server/CloudServer/projects/linux-gmake/bin/debug/CloudServer...done. 
(gdb) r 
Starting program: /home/ole_jak/cloud_server/CloudServer/projects/linux-gmake/bin/debug/CloudServer 
[Thread debugging using libthread_db enabled] 
Cloud Server v0.5 
Copyright (c) 2011 Cloud Forever. All rights reserved. 

Type 'help' to see help messages. 
Config file path: config.xml 
[New Thread 0x7ffff5967700 (LWP 11516)] 
[New Thread 0x7ffff5166700 (LWP 11517)] 
[New Thread 0x7ffff4965700 (LWP 11518)] 
[New Thread 0x7ffff4164700 (LWP 11519)] 
[New Thread 0x7ffff3963700 (LWP 11520)] 
[New Thread 0x7ffff3162700 (LWP 11521)] 
[New Thread 0x7ffff2961700 (LWP 11522)] 
[New Thread 0x7ffff2160700 (LWP 11523)] 
[New Thread 0x7ffff195f700 (LWP 11524)] 
[New Thread 0x7ffff115e700 (LWP 11525)] 
[New Thread 0x7ffff095d700 (LWP 11526)] 
[New Thread 0x7fffebfff700 (LWP 11527)] 
[New Thread 0x7fffeb7fe700 (LWP 11528)] 
[New Thread 0x7fffeaffd700 (LWP 11529)] 
[New Thread 0x7fffea7fc700 (LWP 11530)] 
[New Thread 0x7fffe9ffb700 (LWP 11531)] 
Library libFileService.so opened. 
[New Thread 0x7fffe953c700 (LWP 11532)] 
Library libUsersFilesService.so opened. 

Program received signal SIGSEGV, Segmentation fault. 
0x0000000000000000 in ??() 
(gdb) x/i $pc 
0x0: Cannot access memory at address 0x0 

我的Linux NUBE和我所知道的有關分段的錯,我從wikipedia知道,但我知道我的服務器和這個新的服務,我創造一兩件事 - 它編譯和在Windows上運行並沒有任何錯誤(VS2008,2010解決方案可以從相同的預製腳本創建)。

所以我想知道如何以及在這2個文件中的位置.cpp.h我創建了一個錯誤,不會顯示在Windows上在Linux上顯示如此激動人心的表演?它是可以修復的,還是可以感覺到新鮮的眼睛?

UPDATE: Valgrind的輸出

[email protected]:~/cloud_server/CloudServer/projects/linux-gmake/bin/debug$ valgrind ./CloudServer 
==11682== Memcheck, a memory error detector 
==11682== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. 
==11682== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info 
==11682== Command: ./CloudServer 
==11682== 
Cloud Server v0.5 
Copyright (c) 2011 Cloud Forever. All rights reserved. 

Type 'help' to see help messages. 
Config file path: config.xml 
Library libFileService.so opened. 
Library libUsersFilesService.so opened. 
==11682== Jump to the invalid address stated on the next line 
==11682== at 0x0: ??? 
==11682== by 0x4D49BE: sqlite3_free (sqlite3.c:18155) 
==11682== by 0x102242D5: sqlite3OsInit (sqlite3.c:14162) 
==11682== by 0x1029EB28: sqlite3_initialize (sqlite3.c:107299) 
==11682== by 0x102A159F: openDatabase (sqlite3.c:108909) 
==11682== by 0x102A1B29: sqlite3_open (sqlite3.c:109156) 
==11682== by 0x1021CAB0: sqlite3pp::database::connect(char const*) (sqlite3pp.cpp:89) 
==11682== by 0x1021C6E3: sqlite3pp::database::database(char const*) (sqlite3pp.cpp:74) 
==11682== by 0x1020DDDF: users_files_service::create_files_table(std::string) (users_files_service.cpp:171) 
==11682== by 0x1020BAFC: users_files_service::apply_config(boost::shared_ptr<boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> > >) (users_files_service.cpp:38) 
==11682== by 0x4B5432: server_utils::parse_config_services(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:156) 
==11682== by 0x4B6436: server_utils::parse_config(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:208) 
==11682== Address 0x0 is not stack'd, malloc'd or (recently) free'd 
==11682== 
==11682== 
==11682== Process terminating with default action of signal 11 (SIGSEGV) 
==11682== Bad permissions for mapped region at address 0x0 
==11682== at 0x0: ??? 
==11682== by 0x4D49BE: sqlite3_free (sqlite3.c:18155) 
==11682== by 0x102242D5: sqlite3OsInit (sqlite3.c:14162) 
==11682== by 0x1029EB28: sqlite3_initialize (sqlite3.c:107299) 
==11682== by 0x102A159F: openDatabase (sqlite3.c:108909) 
==11682== by 0x102A1B29: sqlite3_open (sqlite3.c:109156) 
==11682== by 0x1021CAB0: sqlite3pp::database::connect(char const*) (sqlite3pp.cpp:89) 
==11682== by 0x1021C6E3: sqlite3pp::database::database(char const*) (sqlite3pp.cpp:74) 
==11682== by 0x1020DDDF: users_files_service::create_files_table(std::string) (users_files_service.cpp:171) 
==11682== by 0x1020BAFC: users_files_service::apply_config(boost::shared_ptr<boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> > >) (users_files_service.cpp:38) 
==11682== by 0x4B5432: server_utils::parse_config_services(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:156) 
==11682== by 0x4B6436: server_utils::parse_config(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:208) 
==11682== 
==11682== HEAP SUMMARY: 
==11682==  in use at exit: 124,050 bytes in 1,083 blocks 
==11682== total heap usage: 1,814 allocs, 731 frees, 183,517 bytes allocated 
==11682== 
==11682== LEAK SUMMARY: 
==11682== definitely lost: 0 bytes in 0 blocks 
==11682== indirectly lost: 0 bytes in 0 blocks 
==11682==  possibly lost: 46,248 bytes in 799 blocks 
==11682== still reachable: 77,802 bytes in 284 blocks 
==11682==   suppressed: 0 bytes in 0 blocks 
==11682== Rerun with --leak-check=full to see details of leaked memory 
==11682== 
==11682== For counts of detected and suppressed errors, rerun with: -v 
==11682== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4) 
Убито 
[email protected]:~/cloud_server/CloudServer/projects/linux-gmake/bin/debug$ 
+0

要做的第一件事就是下載[valgrind](http://valgrind.org/),(你應該可以通過你的linux發行版下載它),然後運行'valgrind ./CloudServer'。當seg故障發生時能夠給你調用堆棧。另外,僅僅因爲你的Windows構建不會出現段錯誤,並不意味着它沒有錯誤。它可能在沉默中受苦。 –

+1

此外,只是從您輸出的一個快速猜測,但您可能會推斷一個NULL指針。 –

+0

添加Valgrind輸出 – Rella

回答

2

這是一個討厭的一個。我不確定確切的根本原因,但這似乎是一個多線程相關的問題。問題的直接原因是sqlite3Config.m.xSize函數指針在錯誤發生的地點和時間是NULL

該指針應該初始化爲指向第一次調用sqlite3_initialize()時的正確函數,這通常是在您第一次打開SQLite數據庫文件時發生的。通過在GDB中設置斷點和觀察點,我能夠驗證指針是否成功設置,但在分段錯誤時其值爲NULL

這可能意味着兩兩件事之一:

  • 新指針值不正確傳播到所有線程。 SQLite3是假設是線程安全的,但是,線程可以是令人討厭的小玩意兒...

  • 某些東西在初始化後重置指針。我認爲這不太可能,因爲sqlite3Config結構在初始化後通常不會被修改。

我進行了簡單的測試,這亦可以作爲暫時的解決辦法:我main()添加了一個顯式調用sqite3_initialize()作爲第一條語句,允許被啓動的任何線程之前執行。結果,分段錯誤消失了,我得到了一個你的服務器的shell提示符,它指向了這兩個選項中的第一個。請注意,這是一個解決方法,因爲sqite3_initialize()不應被明確調用。問題的根源可能仍然存在,並以其他方式使自己知道 - 或者更糟糕的是,它可能會以微妙而難以察覺的方式破壞事物。

由於sqlite3的是supposed to be thread-safe(和sqlite3_initialize()功能的源代碼似乎正確的在這方面),我不清楚發生了什麼。這可能是sqlite3pp包裝或線程啓動方式的問題。

+0

好 - 線程安全性是無關緊要的......)無論如何,我們找到了解決這個問題的方法......)參見[commit](http://code.google.com/p/cloudobserver/source/detail?r=1579)。問題很簡單 - 我們有主應用程序和.so,它們都是靜態鏈接到SQLite的。當我們首先從我們的主應用程序調用一些函數而不是從SO開始時,它的一切正常,反之亦然給出SIGSEGV。 (注意:SO是在運行時加載的,並且都發生在一個單獨的線程中) – Rella

0

這裏是我的建議。

  1. 關閉優化。某些時候,優化會導致錯誤。例如使用-O0。
  2. 刪除動態加載,嘗試靜態鏈接您的代碼,並查看問題是否仍然存在。
  3. 減小問題的大小。製作可能重現錯誤的最小程序,然後將其發佈到此處。

感謝, 麥克

+3

某些時候優化會導致錯誤不太可能。更有可能的是,優化會在您的代碼中暴露未定義的行爲。 –

+0

是洛基,好點。公開錯誤,不會導致錯誤。 – h4ck3rm1k3

相關問題