2011-03-13 98 views
10

我的程序是用dietlibc靜態編譯的。它在ubuntu x64上編譯(使用-m32標誌爲x86編譯)並在centos x86上運行。gdb奇怪的回溯

編譯的大小隻有大約100KB。我用-ggdb3編譯它,沒有優化標誌。

我的程序使用signal.h來處理一個SIGSEGV信號,然後調用abort()。

該程序運行沒有問題的天,但有時段錯誤。這是當我得到奇怪的回溯,我不明白:

 
[email protected]:~/Desktop$ gdb -c core.28569 program-name 
GNU gdb (GDB) 7.2 
Copyright (C) 2010 Free Software Foundation, Inc. 
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it. 
There is NO WARRANTY, to the extent permitted by law. Type "show copying" 
and "show warranty" for details. 
This GDB was configured as "--host=x86_64-linux-gnu --target=i386-linux-gnu". 
For bug reporting instructions, please see: 
... 
Reading symbols from program-name...done. 
[New Thread 28569] 
Core was generated by `program-name'. 
Program terminated with signal 6, Aborted. 
#0 0x00914410 in __kernel_vsyscall() 
Setting up the environment for debugging gdb. 
Function "internal_error" not defined. 
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal] 
Function "info_command" not defined. 
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal] 
.gdbinit:8: Error in sourced command file: 
Argument required (one or more breakpoint numbers). 
(gdb) bt 
#0 0x00914410 in __kernel_vsyscall() 
During symbol reading, incomplete CFI data; unspecified registers (e.g., eax) at 0x914411. 
#1 0x0804d7f4 in __unified_syscall() 
#2 0xbf8966c0 in ??() 
#3 
#4 0x2054454e in ??() 
#5 0x20524c43 in ??() 
#6 0x2e352e33 in ??() 
#7 0x32373033 in ??() 
#8 0x2e203b39 in ??() 
#9 0x2054454e in ??() 
#10 0x20524c43 in ??() 
#11 0x2e302e33 in ??() 
#12 0x32373033 in ??() 
#13 0x4d203b39 in ??() 
#14 0x61696465 in ??() 
#15 0x6e654320 in ??() 
#16 0x20726574 in ??() 
#17 0x36204350 in ??() 
#18 0x203b302e in ??() 
#19 0x54454e2e in ??() 
#20 0x43302e34 in ??() 
#21 0x00000029 in ??() 
#22 0xbf8989a8 in ??() 
Backtrace stopped: previous frame inner to this frame (corrupt stack?) 
(gdb) bt full 
#0 0x00914410 in __kernel_vsyscall() 
No symbol table info available. 
#1 0x0804d7f4 in __unified_syscall() 
No symbol table info available. 
#2 0xbf8966c0 in ??() 
No symbol table info available. 
#3 
No symbol table info available. 
#4 0x2054454e in ??() 
No symbol table info available. 
#5 0x20524c43 in ??() 
No symbol table info available. 
#6 0x2e352e33 in ??() 
No symbol table info available. 
#7 0x32373033 in ??() 
No symbol table info available. 
#8 0x2e203b39 in ??() 
No symbol table info available. 
#9 0x2054454e in ??() 
No symbol table info available. 
#10 0x20524c43 in ??() 
No symbol table info available. 
#11 0x2e302e33 in ??() 
No symbol table info available. 
#12 0x32373033 in ??() 
No symbol table info available. 
#13 0x4d203b39 in ??() 
No symbol table info available. 
#14 0x61696465 in ??() 
No symbol table info available. 
#15 0x6e654320 in ??() 
No symbol table info available. 
#16 0x20726574 in ??() 
No symbol table info available. 
#17 0x36204350 in ??() 
No symbol table info available. 
#18 0x203b302e in ??() 
No symbol table info available. 
#19 0x54454e2e in ??() 
No symbol table info available. 
#20 0x43302e34 in ??() 
No symbol table info available. 
#21 0x00000029 in ??() 
No symbol table info available. 
#22 0xbf8989a8 in ??() 
No symbol table info available. 
Backtrace stopped: previous frame inner to this frame (corrupt stack?) 
(gdb) quit 

回答

16

這是一個堆棧溢出。

#4 0x2054454e in ??() 

這看起來像文本, 「十」 或 「NET」

#5 0x20524c43 in ??() 

「RLC」 或 「CLR」

等。

將地址看作是文本 - 看看您是否可以確定此文本覆蓋堆棧的位置。

+0

其實Erik是對的。這是一個未初始化變量的strncat。這就是爲什麼它有時會被隔斷,而其他時候則不會。順便說一句,文本確實是「NET」和「CLR」。謝謝。 – 2011-03-13 18:29:40

+0

即使在這種情況下,我的答案可能是*也是*正確的,並且您最好在接下來的*時間內調用dietlibc來呼叫中止。 – 2011-03-13 23:13:14

6

你的堆棧跟蹤其實很容易理解:

  • 你有SIGSEGV的地方,
  • 信號處理程序做任何它,然後叫abort()
  • 頒發raise(2)系統調用,通過調用__unified_syscall()

在GDB中沒有堆棧跟蹤的原因是:

  • __unified_syscall在組件實現,並且
  • 不使用幀指針,和
  • 沒有正確cfi指令來描述如何從它放鬆。

我會認爲這是一個在dietlibc中的錯誤,很容易修復,實際上。看看這個(未經測試)補丁修復了它你:

--- dietlibc-0.31/i386/unified.S.orig 2011-03-13 10:16:23.000000000 -0700 
+++ dietlibc-0.31/i386/unified.S 2011-03-13 10:21:32.000000000 -0700 
@@ -31,8 +31,14 @@ __unified_syscall: 
    movzbl %al, %eax 
.L1: 
    push %edi 
+  cfi_adjust_cfa_offset (4) 
+  cfi_rel_offset (edi, 0) 
    push %esi 
+  cfi_adjust_cfa_offset (4) 
+  cfi_rel_offset (esi, 0) 
    push %ebx 
+  cfi_adjust_cfa_offset (4) 
+  cfi_rel_offset (ebx, 0) 
    movl %esp,%edi 
    /* we use movl instead of pop because otherwise a signal would 
     destroy the stack frame and crash the program, although it 
@@ -61,8 +67,11 @@ __unified_syscall: 
#endif 
.Lnoerror: 
    pop %ebx 
+  cfi_adjust_cfa_offset (-4) 
    pop %esi 
+  cfi_adjust_cfa_offset (-4) 
    pop %edi 
+  cfi_adjust_cfa_offset (-4) 

/* here we go and "reuse" the return for weak-void functions */ 
#include "dietuglyweaks.h" 

如果不能重建dietlibc,或者該修補程序不正確,你仍然可以更好地分析堆棧跟蹤。據我所知,__unified_syscall不碰%ebp。所以,你可能能夠通過這樣做是爲了得到一個合理的堆棧跟蹤:

define xbt 
    set $xbp = (void **)$arg0 
    while 1 
    x/2a $xbp 
    set $xbp = (void **)$xbp[0] 
    end 
end 

xbt $ebp 

注:如果xbt作品,它很可能進入周圍的SIGSEGV信號幀的雜草(即框架不使用幀指針)。這可能會導致完整的垃圾,或者在一個或兩個跳過幀(這將發生SIGSEGV)。

所以你真的好得多得到正確的解開描述符到dietlibc。