2015-11-22 52 views
0

我是新的信號量,並希望添加多線程到我的程序,但我無法解決以下問題:sem_wait()應該能夠接收EINTR和解鎖,只要我didn不設置SA_RESTART標誌。我發送一個SIGUSR1給在sem_wait()中被阻塞的工作者線程,它接收到信號並被中斷,但是它將繼續阻塞,所以它永遠不會給我一個-1返回碼和errno = EINTR 。但是,如果我從主線程執行sem_post,它會解除阻塞,給我一個EINTR的錯誤,但是RC爲0.我對這種行爲感到十分困惑。這是一些奇怪的NetBSD實現還是我在這裏做錯了什麼?根據手冊頁,sem_wait符合POSIX.1(ISO/IEC 9945-1:1996)。一個簡單的代碼:sem_wait不解鎖與EINTR

#include <stdio.h> 
#include <stdlib.h> 
#include <errno.h> 
#include <signal.h> 
#include <pthread.h> 
#include <semaphore.h> 

typedef struct workQueue_s 
{ 
    int full; 
    int empty; 
    sem_t work; 
    int sock_c[10]; 
} workQueue_t; 

void signal_handler(int sig) 
{ 
    switch(sig) 
    { 
     case SIGUSR1: 
     printf("Signal: I am pthread %p\n", pthread_self()); 
     break; 
    } 
} 

extern int errno; 
workQueue_t queue; 
pthread_t workerbees[8]; 

void *BeeWork(void *t) 
{ 
    int RC; 
    pthread_t tid; 
    struct sigaction sa; 
    sa.sa_handler = signal_handler; 
    sigaction(SIGUSR1, &sa, NULL); 

    printf("Bee: I am pthread %p\n", pthread_self()); 
    RC = sem_wait(&queue.work); 
    printf("Bee: got RC = %d and errno = %d\n", RC, errno); 

    RC = sem_wait(&queue.work); 
    printf("Bee: got RC = %d and errno = %d\n", RC, errno); 
    pthread_exit((void *) t); 
} 

int main() 
{ 
    int RC; 
    long tid = 0; 
    pthread_attr_t attr; 
    pthread_attr_init(&attr); 
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); 

    queue.full = 0; 
    queue.empty = 0; 
    sem_init(&queue.work, 0, 0); 

    printf("I am pthread %p\n", pthread_self()); 
    pthread_create(&workerbees[tid], &attr, BeeWork, (void *) tid); 
    pthread_attr_destroy(&attr); 

    sleep(2); 
    sem_post(&queue.work); 
    sleep(2); 
    pthread_kill(workerbees[tid], SIGUSR1); 
    sleep(2); 

    // Remove this and sem_wait will stay blocked 
    sem_post(&queue.work); 
    sleep(2); 
    return(0); 
} 

我知道的printf是不出聲的信號處理程序,但只爲它赫克,如果我刪除它,我得到了相同的結果。

這些人是sem_post結果:

I am pthread 0x7f7fffc00000 
Bee: I am pthread 0x7f7ff6c00000 
Bee: got RC = 0 and errno = 0 
Signal: I am pthread 0x7f7ff6c00000 

並與sem_post:

I am pthread 0x7f7fffc00000 
Bee: I am pthread 0x7f7ff6c00000 
Bee: got RC = 0 and errno = 0 
Signal: I am pthread 0x7f7ff6c00000 
Bee: got RC = 0 and errno = 4 

我知道我並不真的需要解鎖並可以簡單地做一個退出爲主,但無論如何,我想看到它工作。我使用sem_wait的原因是因爲我希望保持工作線程活着,並且一旦有來自Postfix的新客戶端連接,就用sem_post從主線程中等待最長的工作線程。我不想一直執行pthread_create,因爲我會每秒接收多次呼叫,並且我不想失去速度,並且使Postfix無法響應新的smtpd客戶端。這是Postfix的一個policydaemon,服務器很忙。

我在這裏錯過了什麼嗎? NetBSD剛剛搞砸了嗎?

+0

會出現這種情況,如果你正確使用的sigaction?現在你將大量垃圾傳遞給了sigaction(),也許你得到了SA_RESTART標誌集。您絕對需要初始化您的'struct sigaction sa;',或者執行'struct sigaction sa = {0};''或'memset(&sa,0,sizeof sa);' – nos

+0

感謝您的提示,我得到了相同的結果...... – Saskia

+0

至少在NetBSD 7.0 amd64上可以正常工作,並且我得到了'Bee:得到了RC = -1和errno = 4'(注意,你應該刪除'extern int errno',聲明errno就是這樣,在多線程程序中是錯誤的) – nos

回答

0

我的帖子是關於在Linux上的行爲,但我認爲你可能有類似的行爲,或者至少我認爲可能會有所幫助。如果沒有,讓我知道,我會刪除這個無用的'噪音'。

我試圖重現您的設置,我很驚訝地看到您描述的情況。深入探索幫助我發現實際上有些東西更加微妙;如果你看看與strace,你會看到類似的財產以後:

[pid 6984] futex(0x6020e8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> 
[pid 6983] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 
[pid 6983] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 
[pid 6983] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 
[pid 6983] nanosleep({2, 0}, 0x7fffe5794a70) = 0 
[pid 6983] tgkill(6983, 6984, SIGUSR1 <unfinished ...> 
[pid 6984] <... futex resumed>)  = ? ERESTARTSYS (To be restarted if SA_RESTART is set) 
[pid 6983] <... tgkill resumed>)  = 0 
[pid 6984] --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=6983, si_uid=500} --- 
[pid 6983] rt_sigprocmask(SIG_BLOCK, [CHLD], <unfinished ...> 
[pid 6984] rt_sigreturn(<unfinished ...> 
[pid 6983] <... rt_sigprocmask resumed> [], 8) = 0 
[pid 6984] <... rt_sigreturn resumed>) = -1 EINTR (Interrupted system call) 

看到ERESTARTSYS和線EINTR:被打斷的SISTEM調用實際上是rt_sigreturn resumed,不futex(系統調用的sem_wait底層)如你所料。 我必須說,我很疑惑,但讀書的人給了一些有趣的線索(男子7信號):

If a blocked call to one of the following interfaces is interrupted by 
    a signal handler, then the call will be automatically restarted after 
    the signal handler returns if the SA_RESTART flag was used; otherwise 
    the call will fail with the error EINTR: 
[...] 

     * futex(2) FUTEX_WAIT (since Linux 2.6.22; beforehand, always 
     failed with EINTR). 

所以我猜你有,有一個類似的行爲內核(?見NetBSD的文檔),您可以觀察系統調用會自動重啓,而沒有任何機會看到它。

這麼說,我完全從你的程序中刪除的sem_post(),只是發送信號到「破」的sem_wait()ANS看strace的,我看到(蜂線程上過濾):

[pid 8309] futex(0x7fffc0470990, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> 
[pid 8309] <... futex resumed>)  = ? ERESTARTSYS (To be restarted if SA_RESTART is set) 
[pid 8309] --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=8308, si_uid=500} --- 
[pid 8309] rt_sigreturn()    = -1 EINTR (Interrupted system call) 
[pid 8309] madvise(0x7fd5f6019000, 8368128, MADV_DONTNEED) = 0 
[pid 8309] _exit(0) 

我必須說,我不掌握細節,但是內核似乎找出我試圖站立和使整個事情有正確的行爲:

Bee: got RC = -1 and errno = Interrupted system call 
+0

感謝您的幫助,我只有ktrace,當我刪除最後一個sem_post並設置最後一次睡眠時間稍長時,我得到: 10631 1 a.out RET __nanosleep50 0 10631 1 a.out CALL exit(0 ) 10631 2 a.out RET ___lwp_park50 -1 errno 4中斷的系統調用 – Saskia

0

謝謝您的回答OznOg,如果我刪除最後一個sem_post並使最後一次睡眠更長一點,我得到這與ktrace:

PSIG SIGUSR1 caught handler=0x40035c mask=(): code=SI_LWP sent by pid=10631, uid=0) 
CALL write(1,0x7f7ff7e04000,0x24) 
GIO fd 1 wrote 36 bytes "Signal: I am pthread 0x7f7ff7800000\n" 
RET write 36/0x24 
CALL setcontext(0x7f7ff7bff970) 
RET setcontext JUSTRETURN 
CALL ___lwp_park50(0,0,0x7f7ff7e01100,0x7f7ff7e01100) 
RET __nanosleep50 0 
CALL exit(0) 
RET ___lwp_park50 -1 errno 4 Interrupted system call 

好像sem_wait只能由一個退出或sem_post返回....