我正在處理批處理系統(轉矩),現在重要的部分是這樣的:當一個交互式作業運行時,提交工具將與執行主辦。這在其中一臺機器上失敗。中斷系統調用的奇怪錯誤,我無法調試
這裏是strace的輸出:
透過工具:
16:18:36.219925 fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
16:18:36.219925 read(4, "610.torque1.ics.muni.cz\0\0\0\0\0\0\0\0\0\0"..., 16385) = 1046
16:18:36.219925 write(4, "TERM=xterm\0\0\0\0\0\0\220\5\377\377\377\177\0\0\214\303u\310\277\177\0\0\26"..., 80) = 80
16:18:36.219925 write(4, "\3\34\177\25\4\32"..., 6) = 6
16:18:36.219925 write(4, "WINSIZE 46,166,0,0\0\0\0\0\0\0\[email protected]\0\0\0\0\0\0\0"..., 80) = 80
16:18:36.219925 write(1, "qsub: job 610.torque1.ics.muni.cz"..., 41qsub: job 610.torque1.ics.muni.cz ready ) = 41
16:18:36.219925 rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
16:18:36.219925 rt_sigaction(SIGTERM, {SIG_IGN}, NULL, 8) = 0
16:18:36.219925 rt_sigaction(SIGALRM, {SIG_IGN}, NULL, 8) = 0
16:18:36.219925 rt_sigaction(SIGTSTP, {SIG_IGN}, NULL, 8) = 0
16:18:36.219925 clone(Process 20724 attached child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbfc9a2d770) = 20724
執行主機部分:
[pid 8778] 15:59:16.371145 getsockopt(3, SOL_SOCKET, SO_ERROR, [4294967296], [4]) = 0
[pid 8778] 15:59:16.371145 fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 8778] 15:59:16.371145 fcntl(3, F_SETFL, O_RDWR) = 0
[pid 8778] 15:59:16.371145 write(3, "609.torque1.ics.muni.cz\0\0\0\0\0\0\0\0\0\0"..., 1046) = 1046
[pid 8778] 15:59:16.371145 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
[pid 8778] 15:59:16.375144 read(3, 0x717ae0, 80) = ? ERESTARTSYS (To be restarted)
[pid 8778] 15:59:21.367024 --- SIGALRM (Alarm clock) @ 0 (0) ---
[pid 8778] 15:59:21.367024 rt_sigreturn(0x8) = -1 EINTR (Interrupted system call)
[pid 8778] 15:59:21.367024 ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
[pid 8778] 15:59:21.367024 write(2, "pbs_mom: LOG_ERROR::Interrupted s"
編輯:這些其實都是獨立運行,但輸出始終完全相同。
你打算編輯代碼,還是你在做一個行政調查? – atk 2010-09-14 15:08:31
@atk我打算編輯代碼(我實際上可以在這裏發佈,但它只是一些讀/寫)。我需要修復這個錯誤。 – 2010-09-14 15:14:36