沒有發現泄漏時調試分段錯誤的提示

我寫了一個基於C的應用程序，看起來運行良好，除了非常大的數據集作爲輸入。沒有發現泄漏時調試分段錯誤的提示

當輸入量很大時，我在二進制功能的最後步驟中遇到了分段錯誤。

我valgrind跑了二進制（與測試輸入）：

valgrind --tool=memcheck --leak-check=yes /foo/bar/baz inputDataset > outputAnalysis

這項工作通常需要幾個小時，但valgrind花了七天。

不幸的是，在這一點上，我不知道如何閱讀我從這次運行中得到的結果。

我得到了很多這些警告：

... 
==4074== Conditional jump or move depends on uninitialised value(s)                             
==4074== at 0x435900: ??? (in /foo/bar/baz)                     
==4074== by 0x439CC5: ??? (in /foo/bar/baz)                     
==4074== by 0x400BF2: ??? (in /foo/bar/baz)                     
==4074== by 0x402086: ??? (in /foo/bar/baz)                     
==4074== by 0x402A0F: ??? (in /foo/bar/baz)                     
==4074== by 0x41684F: ??? (in /foo/bar/baz)                     
==4074== by 0x4001B8: ??? (in /foo/bar/baz)                     
==4074== by 0x7FEFFFF57: ???                                      
==4074== Uninitialised value was created                                    
==4074== at 0x461D3A: ??? (in /foo/bar/baz)                     
==4074== by 0x43F926: ??? (in /foo/bar/baz)                     
==4074== by 0x416B9B: ??? (in /foo/bar/baz)                     
==4074== by 0x416725: ??? (in /foo/bar/baz)                     
==4074== by 0x4001B8: ??? (in /foo/bar/baz)                     
==4074== by 0x7FEFFFF57: ??? 
...

有代碼的任何部件暗示，沒有變量的名稱，等等。我可以用這些信息做什麼？

最後，我終於得到了下面的錯誤，但 - 與較小的數據集不死機 - valgrind沒有發現泄漏：

... 
==4074== Process terminating with default action of signal 11 (SIGSEGV)                            
==4074== Access not within mapped region at address 0x7158E7F7                              
==4074== at 0x7158E7F7: ???                                      
==4074== by 0x4020B8: ??? (in /foo/bar/baz)                     
==4074== by 0x6322203A22656D6E: ???                                    
==4074== by 0x306C675F6E557267: ???                                    
==4074== by 0x202C22373232302F: ???                                    
==4074== by 0x6D616E656C696621: ???                                    
==4074== by 0x72686322203A2264: ???                                    
==4074== by 0x3030306C675F6E54: ???                                    
==4074== by 0x346469702E373231: ???                                    
==4074== by 0x646469662E34372F: ???                                    
==4074== by 0x722E64616568656B: ???                                    
==4074== by 0x63656D6F6C756764: ???                                    
==4074== If you believe this happened as a result of a stack                               
==4074== overflow in your program's main thread (unlikely but                              
==4074== possible), you can try to increase the size of the                               
==4074== main thread stack using the --main-stacksize= flag.                               
==4074== The main thread stack size used in this run was 10485760.                             
==4074==                                            
==4074== HEAP SUMMARY:                                        
==4074==  in use at exit: 0 bytes in 0 blocks                                  
==4074== total heap usage: 0 allocs, 0 frees, 0 bytes allocated                              
==4074==                                            
==4074== All heap blocks were freed -- no leaks are possible                               
==4074==                                            
==4074== For counts of detected and suppressed errors, rerun with: -v                             
==4074== ERROR SUMMARY: 1603141870 errors from 86 contexts (suppressed: 0 from 0) 
Segmentation fault

一切我爲分配空間得到同等free聲明後，我設置了指針NULL。

在這一點上，我怎樣才能最好地調試這個應用程序，以確定還有什麼導致分段錯誤？

2011年12月22日 - 編輯

我編譯的二進制的調試版本，稱爲debug-binary，使用下面的編譯標誌：

-D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1 -DUSE_ZLIB -g -O0 -Wformat -Wall -pedantic -std=gnu99

當我與valgrind運行，我沒有得到更多的信息：

valgrind -v --tool=memcheck --leak-check=yes --error-limit=no --track-origins=yes debug-binary input > output

下面是輸出的一個片段：

==25116== 2 errors in context 14 of 14:                                                  
==25116== Invalid read of size 4                                                    
==25116== at 0x4045E8: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x40682F: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x404F0C: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x401FA4: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x402016: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x403B27: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x40295E: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)                                           
==25116== Address 0x539f188 is 24 bytes inside a block of size 48 free'd                                          
==25116== at 0x4A05D21: free (vg_replace_malloc.c:325)                                              
==25116== by 0x401F6B: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x402016: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x403B27: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x40295E: ??? (in /foo/bar/debug-binary)                                 
==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)

這是一個問題，我的二進制文件，或與系統庫（libc），我的應用取決於？

我也不知道如何解釋???條目。是否有另一個編輯標誌我需要valgrind提供更多信息？

來源

2011-12-19 Alex Reynolds

你在GDB下運行嗎？你是否正在構建調試信息（'-ggdb'等）？您是否將任何大型數據結構/數組放在堆棧上，還是您有深度遞歸？（基本上，你可以溢出堆棧）？還要注意的是，Valgrind不會跟蹤堆棧分配變量的緩衝區溢出。 – 2011-12-19 22:21:32

如果你確定所有東西都是免費的，那麼你的緩衝區溢出問題在某處 – James 2011-12-19 22:42:54

......或者他在釋放後讓陳舊的指針懸空。 – wildplasser 2011-12-19 23:28:16

Valgrind基本上說沒有顯着的堆管理問題。該程序是從不太複雜的編程錯誤中斷開來的。

如果是我，我會

與gcc -g編譯它，
使核心轉儲文件（ulimit -c unlimited）
運行程序正常，
，讓它故障
使用gdb來檢查覈心文件並查看它發生故障時的功能：

GDB（programfile）（核心文件）
BT

來源

2011-12-19 22:30:56 wallyk

感謝您的幫助。你能指點我一個有據可查的方法來使用gdb來檢查覈心文件嗎？ – 2011-12-19 22:36:56

@AlexReynolds：我在我的答案中添加了最典型的命令。 – wallyk 2011-12-19 22:40:56

我不相信Valgrind是能夠找到你到哪兒去溢出到堆棧的值（但不是全部溢出錯誤堆棧本身）。所以，你可能想嘗試gcc的-f-stack-protector-all選項。

你也應該試試mudflap，用-fmudflap（單線程）或-fmudflapth（多線程）。

兩個擋泥板和堆疊保護器應該是多比valgrind更快。

此外，它看起來像你沒有調試符號，使閱讀回溯困難。添加-ggdb。您可能還希望啓用核心文件生成（請嘗試ulimit -c unlimited）。這樣，您可以嘗試使用gdb program core來調試進程崩潰後的進程。

@wallyk表示，你的段錯誤實際上可能是相當容易找到的東西 - 例如，也許你是取消引用NULL，並且gdb可以將你指向確切的行（或者，除非用-O0進行編譯，否則關閉）。這是有道理的，例如，如果你只是爲更大的數據集運行內存，並且malloc返回NULL，並且忘記檢查某個地方。

最後，如果沒有其他意義，總會有硬件問題的可能性。但是這些預計會是相當隨機的，例如，不同的值被損壞的不同運行。如果您嘗試使用不同的機器，並且發生在那裏，那麼它不太可能成爲硬件問題。

來源

2011-12-19 22:45:01 derobert

「條件跳轉或移動取決於未初始化的值」是您需要修復的嚴重錯誤。它表示程序的行爲受未初始化變量（包括由malloc()返回的未初始化內存區域）的內容影響。

要從valgrind獲得可讀的回溯，您需要編譯-g。

來源

2011-12-19 23:24:49 caf

沒有發現泄漏時調試分段錯誤的提示

回答

相關問題