2014-11-17 148 views
1

我用這個answer啓發了一個使用fork()和execv()啓動進程的輔助函數。它用於例如開始mysqldump進行數據庫備份。 該代碼在不同程序的幾個不同位置完全正常工作。Linux:fork&execv,等待子進程掛起

現在我打了一個失敗的星座: 這是一個調用systemctl來停止一個單位。運行systemctl工作,單元停止。但是在中間進程中,當wait()等待子進程時,wait()會掛起,直到超時過程結束。 如果我檢查,如果工作進程完成kill(),我可以告訴它它已經完成。

重要提示:除了wait()不表示結束工作進程外,程序不會發生錯誤或故障。 我的代碼中是否有任何內容(見下文)不正確,可能會觸發該行爲? 我讀過Threads and fork(): think twice before mixing them,但我找不到與我的問題有關的任何內容。

奇怪的是: 程序中使用了深入深入的JSON-RPC。如果我使用JSON-RPC停用代碼,那麼一切正常!?

環境: 使用該函數的程序是一個多線程應用程序。所有線程的信號都被阻止。主線程通過sigtimedwait()處理信號。

代碼(生產代碼,其中測井得到通過的std :: COUT交易來輸出)與樣品主要功能:

#include <iostream> 

#include <unistd.h> 
#include <sys/wait.h> 

namespace { 

bool checkStatus(const int status) { 
    return(WIFEXITED(status) && (WEXITSTATUS(status) == 0)); 
} 

} 

bool startProcess(const char* const path, const char* const argv[], const unsigned int timeoutInSeconds, pid_t& processId, const int* const fileDescriptor) { 
    auto result = true; 

    const pid_t intermediatePid = fork(); 
    if(intermediatePid == 0) { 
     // intermediate process 
     std::cout << "Intermediate process: Started (" << getpid() << ")." << std::endl; 
     const pid_t workerPid = fork(); 
     if(workerPid == 0) { 
      // worker process 
      if(fileDescriptor) { 
       std::cout << "Worker process: Redirecting file descriptor to stdin." << std::endl; 
       const auto dupResult = dup2(*fileDescriptor, STDIN_FILENO); 
       if(-1 == dupResult) { 
        std::cout << "Worker process: Duplication of file descriptor failed." << std::endl; 
        _exit(EXIT_FAILURE); 
       } 
      } 
      execv(path, const_cast<char**>(argv)); 

      std::cout << "Intermediate process: Worker failed!" << std::endl; 
      _exit(EXIT_FAILURE); 
     } else if(-1 == workerPid) { 
      std::cout << "Intermediate process: Starting worker failed!" << std::endl; 
      _exit(EXIT_FAILURE); 
     } 

     const pid_t timeoutPid = fork(); 
     if(timeoutPid == 0) { 
      // timeout process 
      std::cout << "Timeout process: Started (" << getpid() << ")." << std::endl; 
      sleep(timeoutInSeconds); 
      std::cout << "Timeout process: Finished." << std::endl; 
      _exit(EXIT_SUCCESS); 
     } else if(-1 == timeoutPid) { 
      std::cout << "Intermediate process: Starting timeout process failed." << std::endl; 
      kill(workerPid, SIGKILL); 
      std::cout << "Intermediate process: Finished." << std::endl; 
      _exit(EXIT_FAILURE); 
     } 

     // --------------------------------------- 
     // This code is only used for double checking if the worker is still running. 
     // The if condition never evaluated to true in my tests. 
     const auto killResult = kill(workerPid, 0); 
     if((-1 == killResult) && (ESRCH == errno)) { 
      std::cout << "Intermediate process: Worker is not running." << std::endl; 
     } 
     // --------------------------------------- 

     std::cout << "Intermediate process: Waiting for child processes." << std::endl; 
     int status = -1; 
     const pid_t exitedPid = wait(&status); 

     // --------------------------------------- 
     // This code is only used for double checking if the worker is still running. 
     // The if condition evaluates to true in the case of an error. 
     const auto killResult2 = kill(workerPid, 0); 
     if((-1 == killResult2) && (ESRCH == errno)) { 
      std::cout << "Intermediate process: Worker is not running." << std::endl; 
     } 
     // --------------------------------------- 

     std::cout << "Intermediate process: Child process finished. Status: " << status << "." << std::endl; 
     if(exitedPid == workerPid) { 
      std::cout << "Intermediate process: Killing timeout process." << std::endl; 
      kill(timeoutPid, SIGKILL); 
     } else { 
      std::cout << "Intermediate process: Killing worker process." << std::endl; 
      kill(workerPid, SIGKILL); 
      std::cout << "Intermediate process: Waiting for worker process to terminate." << std::endl; 
      wait(nullptr); 
      std::cout << "Intermediate process: Finished." << std::endl; 
      _exit(EXIT_FAILURE); 
     } 
     std::cout << "Intermediate process: Waiting for timeout process to terminate." << std::endl; 
     wait(nullptr); 
     std::cout << "Intermediate process: Finished." << std::endl; 
     _exit(checkStatus(status) ? EXIT_SUCCESS : EXIT_FAILURE); 

    } else if(-1 == intermediatePid) { 
     // error 
     std::cout << "Parent process: Error starting intermediate process!" << std::endl; 
     result = false; 
    } else { 
     // parent process 
     std::cout << "Parent process: Intermediate process started. PID: " << intermediatePid << "." << std::endl; 
     processId = intermediatePid; 
    } 

    return(result); 
} 

bool waitForProcess(const pid_t processId) { 
    int status = 0; 
    const auto waitResult = waitpid(processId, &status, 0); 
    auto result = false; 
    if(waitResult == processId) { 
     result = checkStatus(status); 
    } 
    return(result); 
} 

int main() { 
    pid_t pid = 0; 
    const char* const path = "/bin/ls"; 
    const char* argv[] = { "/bin/ls", "--help", nullptr }; 
    const unsigned int timeoutInS = 5; 
    const auto startResult = startProcess(path, argv, timeoutInS, pid, nullptr); 
    if(startResult) { 
     const auto waitResult = waitForProcess(pid); 
     std::cout << "waitForProcess returned " << waitResult << "." << std::endl; 
    } else { 
     std::cout << "startProcess failed!" << std::endl; 
    } 
} 

編輯

預期的輸出應包含

  • 中間過程:等待子過程。
  • 中間過程:子過程完成。狀態:0.
  • 中間過程:殺死超時過程。

在錯誤的情況下輸出看起來像這樣

  • 中間過程:等待子進程。
  • 中間過程:子過程完成。狀態:-1
  • 中間過程:殺死工作進程。

當您運行示例代碼時,您很可能會看到預期的輸出。我不能在一個簡單的例子中重現錯誤的結果。

+0

而父被阻止'wait',請問孩子顯示爲一個殭屍? – Useless

+0

我編譯和運行的代碼,我沒覺得有有奇怪的行爲,你能提供的'預期result'和'當前result'? –

+0

@無用:沒有殭屍進程。 'ps -l'返回例如用於中間的進程ID 617,用於超時的進程ID 619,但不是618。 我在殺碼測試()產生相同的結果。 – DrP3pp3r

回答

4

我發現這個問題:

在貓鼬(JSON-RPC使用貓鼬)源在功能mg_start我發現下面的代碼

#if !defined(_WIN32) && !defined(__SYMBIAN32__) 
    // Ignore SIGPIPE signal, so if browser cancels the request, it 
    // won't kill the whole process. 
    (void) signal(SIGPIPE, SIG_IGN); 
    // Also ignoring SIGCHLD to let the OS to reap zombies properly. 
    (void) signal(SIGCHLD, SIG_IGN); 
#endif // !_WIN32 

(void) signal(SIGCHLD, SIG_IGN);

導致該

如果父不等待(),該調用將只返回時,所有的孩子都退出,然後返回-1,errno設置爲ECHILD「

所提到的部分here5.5巫術:等待和SIGCHLD

這也是手冊頁描述WAIT(2)

錯誤[...]

ECHILD [...](這種情況可能 一個自己的孩子,如果對於SIGCHLD操作設置也SIG_IGN。 看到有關線程Linux的註釋部分。)

愚蠢對我而言不正確檢查返回值。 試圖

if(exitedPid == workerPid) { 

之前,我應該檢查該exitedPid!= -1

如果我這樣做errno給我ECHILD。如果我首先知道,我會閱讀手冊頁,並可能更快地發現問題...

淘氣的貓鼬只是爲了弄亂信號處理,無論應用程序想做什麼。此外,貓鼬在用mg_stop停止時不會恢復信號處理的改變。

附加信息: 導致此問題的代碼是在改變貓鼬在2013年9月與this commit