2013-07-16 97 views
1

我實現了一個echo服務器(在C++中),主要分叉和執行多個iperf命令(linux),我得到一個內核崩潰。我對內核編程相當陌生,並不知道如何閱讀內核崩潰日誌,並希望有人能幫助我。回聲服務器隨機崩潰,很難確定它在哪裏崩潰。大部分時間它工作了大約2個小時(大約20-30個iperf命令),然後內核凍結並崩潰,需要重新啓動。我遇到了一次碰撞,似乎在iperf命令中間崩潰。內核崩潰 - 無法處理內核NULL指針解除引用000002c0

這是我正在運行的linux版本。

的Linux 5NetSim08 2.6.35-22-通用#35,Ubuntu的SMP週六10月16日20時36分48秒UTC 2010 i686的GNU/Linux的

這裏是崩潰日誌

[ 1883.081940] BUG: unable to handle kernel NULL pointer dereference at 000002c0 
[ 1883.089141] IP: [<c153d24f>] __udp4_lib_rcv+0x16f/0x680 
[ 1883.089141] *pdpt = 0000000000000000 *pde = f000eef3f000eef3 
[ 1883.089141] Oops: 0000 [#1] SMP 
[ 1883.089141] Modules linked in: coretemp gpio_ich snd_hda_codec_realtek microcode parport_pc rfcomm bnep ppdev bluetooth snd_hda_intel psmouse snd_hda_codec snd_hwdep nfsd nfs binfmt_misc serio_raw lockd fscache auth_rpcgss nfs_acl snd_pcm sunrpc i915 lpc_ich drm_kms_helper snd_seq_midi snd_rawmidi snd_seq_midi_event drm i2c_algo_bit snd_seq snd_timer snd_seq_device mac_hid snd soundcore video snd_page_alloc lp parport r8169 
[ 1883.089141] 
[ 1883.089141] Pid: 0, comm: swapper/0 Not tainted 3.5.0-25-generiC#39~precise1-Ubuntu /i945GSEx-QS R2.00 May.24.2010 
[ 1883.089141] EIP: 0060:[<c153d24f>] EFLAGS: 00010282 CPU: 0 
[ 1883.089141] EIP is at __udp4_lib_rcv+0x16f/0x680 
[ 1883.089141] EAX: 00000000 EBX: f68609c0 ECX: f1367d80 EDX: f2ad344e 
[ 1883.089141] ESI: f2ad3462 EDI: 00000011 EBP: f4c0be08 ESP: f4c0bdb4 
[ 1883.089141] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 
[ 1883.089141] CR0: 8005003b CR2: 000002c0 CR3: 019a1000 CR4: 000007e0 
[ 1883.089141] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 
[ 1883.089141] DR6: ffff0ff0 DR7: 00000400 
[ 1883.089141] Process swapper/0 (pid: 0, ti=f4c0a000 task=c1869240 task.ti=c185c000) 
[ 1883.089141] Stack: 
[ 1883.089141] 00004e3c f6860c00 000069c0 00000440 01000000 f68609c0 00004b28 c154f851 
[ 1883.089141] f596ac00 f6860c00 00004e28 0b09a8c0 c18d8c88 f1368b17 000042c3 c18c5b00 
[ 1883.089141] 0b09a8c0 0b0aa8c0 f68609c0 c165c59c 00000011 f4c0be10 c153d777 f4c0be34 
[ 1883.089141] Call Trace: 
[ 1883.089141] [<c154f851>] ? inet_frag_destroy+0xc1/0x100 
[ 1883.089141] [<c153d777>] udp_rcv+0x17/0x20 
[ 1883.089141] [<c15148a7>] ip_local_deliver_finish+0xa7/0x290 
[ 1883.089141] [<c1514bdf>] ip_local_deliver+0x3f/0x80 
[ 1883.089141] [<c1514567>] ip_rcv_finish+0xf7/0x390 
[ 1883.089141] [<c1514e41>] ip_rcv+0x221/0x320 
[ 1883.089141] [<c1143567>] ? kmem_cache_alloc+0x77/0x140 
[ 1883.089141] [<c14e91d7>] __netif_receive_skb+0x437/0x4c0 
[ 1883.089141] [<c14e927f>] netif_receive_skb+0x1f/0x80 
[ 1883.089141] [<c14e965b>] ? dev_gro_receive+0x16b/0x240 
[ 1883.089141] [<c14e941f>] napi_skb_finish+0x4f/0x70 
[ 1883.089141] [<c14ea299>] napi_gro_receive+0xe9/0x110 
[ 1883.089141] [<f8438c6c>] rtl_rx+0x9c/0x300 [r8169] 
[ 1883.089141] [<f843bc56>] rtl8169_poll+0xc6/0xd0 [r8169] 
[ 1883.089141] [<c14e99fd>] net_rx_action+0x10d/0x1e0 
[ 1883.089141] [<c104e190>] ? local_bh_enable_ip+0x90/0x90 
[ 1883.089141] [<c104e211>] __do_softirq+0x81/0x1a0 
[ 1883.089141] [<c104e190>] ? local_bh_enable_ip+0x90/0x90 
[ 1883.089141] <IRQ> 
[ 1883.089141] [<c104e566>] ? irq_exit+0x76/0xa0 
[ 1883.089141] [<c15eb01b>] ? do_IRQ+0x4b/0xc0 
[ 1883.089141] [<c109953a>] ? tick_notify+0x11a/0x1d0 
[ 1883.089141] [<c15eae70>] ? common_interrupt+0x30/0x38 
[ 1883.089141] [<c104007b>] ? post_kmmio_handler+0x4b/0xc0 
[ 1883.089141] [<c132eaf2>] ? intel_idle+0xc2/0x120 
[ 1883.089141] [<c14a8435>] ? cpuidle_enter+0x15/0x20 
[ 1883.089141] [<c14a89cc>] ? cpuidle_idle_call+0x9c/0x260 
[ 1883.089141] [<c10195fa>] ? cpu_idle+0xaa/0xe0 
[ 1883.089141] [<c15afe65>] ? rest_init+0x5d/0x68 
[ 1883.089141] [<c18daa1f>] ? start_kernel+0x375/0x37b 
[ 1883.089141] [<c18da62b>] ? pass_bootoption.constprop.3+0xaf/0xaf 
[ 1883.089141] [<c18da303>] ? i386_start_kernel+0xa6/0xad 
[ 1883.089141] Code: 0f b7 4e 02 0f b7 06 66 89 4d e0 8b 4b 10 85 c9 0f 85 93 04 00 00 8b 4b 48 0f b7 c0 89 45 e4 8b 42 0c 83 e1 fe 89 45 d8 8b 41 0c <8b> 80 c0 02 00 00 89 45 d0 8b 45 dc 89 44 24 0c 8b 41 70 8b 4d 
[ 1883.089141] EIP: [<c153d24f>] __udp4_lib_rcv+0x16f/0x680 SS:ESP 0068:f4c0bdb4 
[ 1883.089141] CR2: 00000000000002c0 
[ 1883.561261] ---[ end trace a7bdb48fae24ea76 ]--- 
[ 1883.565953] Kernel panic - not syncing: Fatal exception in interrupt 
[ 1883.581374] panic occurred, switching back to text console 

任何幫助診斷問題將不勝感激。謝謝

+0

我不是內核專家 - 因此這只是一個評論。對我來說,看起來像網絡驅動程序由於分段的UDP數據包而崩潰。它可能是值得研究它崩潰的函數的內核源代碼(inet_frag_destroy) – ogni42

+0

感謝您的幫助。你認爲將內核更新到更新版本會有幫助嗎?我不太確定分段的UDP數據包來自哪裏,因爲我在iperf中指定數據包大小爲1400,並且任何通過套接字發送的文本不超過100個字符。 – MarcusS

回答

1

這不是確切的答案。但我希望能在一定程度上引導你。

__udp4_lib_rcv+0x16f/0x680是麻煩製造者。

違規指令是從函數__udp4_lib_rcv開始的0x16f個字節,總共有0x680個字節。

我建議重現這個問題並在這方面進行調查。