2012-03-30 135 views
3

我們有一個間歇性的問題掛在工作本身完成後的奴隸。在後處理步驟(?)中,我們看到的是控制檯日誌有以下行:詹金斯奴隸掛/詹金斯楔形

Description set: vap_current_iter-2012_03_29_19_01_03 

然後什麼也沒有。通常情況下,它看起來就像這樣:

Description set: prod_pull-2012_03_28_19_01_03 
Notifying upstream build armada_Launch_prod_pull #13 of job completion 
Project armada_Launch_prod_pull still waiting for 1 builds to complete 
Notifying upstream projects of job completion 
Notifying upstream of completion: armada_Launch_prod_pull #13 
Finished: SUCCESS 

我設置一個記錄器爲hudson.model.Run,它目前有這樣的:

at java.lang.Thread.run(Thread.java:619) 

Mar 30, 2012 12:44:00 PM hudson.model.Run run 
INFO: galleon_allUnit #1134 main build action completed: SUCCESS 
Mar 30, 2012 12:44:00 PM hudson.model.Run setResult 
FINE: galleon_allUnit #1134 : result is set to SUCCESS 
java.lang.Exception 
    at hudson.model.Run.setResult(Run.java:352) 
    at hudson.model.Run.run(Run.java:1410) 
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) 
    at hudson.model.ResourceController.execute(ResourceController.java:88) 
    at hudson.model.Executor.run(Executor.java:238) 

重複每一個掛奴隸。

主哈德森日誌沒有任何附加信息。

斷開從站不起作用。

試圖做一個有序關閉詹金斯沒有任何影響(jenkins實際上似乎掛在關機)。

我們發現恢復的唯一方法是殺死tomcat進程。

胎面轉儲奴隸之一(他們都是一樣的)是:

Thread Dump 
Channel reader thread: channel 

"Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native) 
    at java.io.FileInputStream.readBytes(Native Method) 
    at java.io.FileInputStream.read(FileInputStream.java:199) 
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) 
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237) 
    - locked [email protected] 
    at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) 
    at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) 
    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) 
    at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030) 


main 

"main" Id=1 Group=main WAITING on [email protected] 
    at java.lang.Object.wait(Native Method) 
    - waiting on [email protected] 
    at java.lang.Object.wait(Object.java:485) 
    at hudson.remoting.Channel.join(Channel.java:766) 
    at hudson.remoting.Launcher.main(Launcher.java:420) 
    at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366) 
    at hudson.remoting.Launcher.run(Launcher.java:206) 
    at hudson.remoting.Launcher.main(Launcher.java:168) 


Ping thread for channel [email protected]:channel 

"Ping thread for channel [email protected]:channel" Id=10 Group=main TIMED_WAITING 
    at java.lang.Thread.sleep(Native Method) 
    at hudson.remoting.PingThread.run(PingThread.java:86) 


Pipe writer thread: channel 

"Pipe writer thread: channel" Id=12 Group=main WAITING on java.u[email protected]14263ed 
    at sun.misc.Unsafe.park(Native Method) 
    - waiting on java.u[email protected]14263ed 
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) 
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) 
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) 
    at java.lang.Thread.run(Thread.java:619) 


pool-1-thread-267 

"pool-1-thread-267" Id=285 Group=main RUNNABLE 
    at sun.management.ThreadImpl.dumpThreads0(Native Method) 
    at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374) 
    at hudson.Functions.getThreadInfos(Functions.java:872) 
    at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:93) 
    at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:89) 
    at hudson.remoting.UserRequest.perform(UserRequest.java:118) 
    at hudson.remoting.UserRequest.perform(UserRequest.java:48) 
    at hudson.remoting.Request$2.run(Request.java:287) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
    at java.lang.Thread.run(Thread.java:619) 

    Number of locked synchronizers = 1 
    - [email protected] 


Finalizer 

"Finalizer" Id=3 Group=system WAITING on [email protected] 
    at java.lang.Object.wait(Native Method) 
    - waiting on [email protected] 
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) 
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) 
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) 


Reference Handler 

"Reference Handler" Id=2 Group=system WAITING on [email protected] 
    at java.lang.Object.wait(Native Method) 
    - waiting on [email protected] 
    at java.lang.Object.wait(Object.java:485) 
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) 


Signal Dispatcher 

"Signal Dispatcher" Id=4 Group=system RUNNABLE 

就如何更好地恢復或防止這種情況的任何想法,將不勝感激。

+0

討厭。什麼是操作系統? – 2012-03-30 19:44:03

+0

看起來像一個錯誤。 [報告](https://wiki.jenkins-ci.org/display/JENKINS/Issue+Tracking)。 – 2012-03-31 20:52:15

+0

我們在所有盒子上運行linux(RHEL 5)。 – 2012-04-03 13:56:08

回答

0

我們誠實地寫了一個腳本,每天下午4點重新啓動jenkins。我們發現我們的破損發生在凌晨3點左右,大概需要半小時左右。由於此時重新啓動服務器,我們還沒有看到任何進一步的掛起。這是一種防止問題的方法,雖然它不會明顯「修復」問題!

+0

我們嘗試過 - 沒有運氣。沒有停止tomcat並等待10至15分鐘才重新啓動,沒有任何修復它。而且,由於這裏的目標是24小時制,所以每天重新啓動不是一種選擇。 – 2012-04-03 13:56:47