pthread程序中例程的額外執行時間是多少？

我寫了四個不同的程序來計算兩個文件中的總詞數。這四個版本看起來大致相同。前三個版本使用兩個線程進行計數，而只有三個語句的順序不同。最後一個版本使用一個線程來計數。我將首先列出每個版本的不同部分和公共部分，然後列出每個版本的輸出和我的問題。pthread程序中例程的額外執行時間是多少？

不同的部分：

// version 1 
count_words(&file1); 
pthread_create(&new_thread, NULL, count_words, &file2); 
pthread_join(new_thread, NULL); 

// version 2 
pthread_create(&new_thread, NULL, count_words, &file2); 
count_words(&file1); 
pthread_join(new_thread, NULL); 

// version 3 
pthread_create(&new_thread, NULL, count_words, &file2); 
pthread_join(new_thread, NULL); 
count_words(&file1); 

// version 4 
count_words(&file1); 
count_words(&file2);

公共部分：（插入不同的部分爲這個共同的部分，使完整版）

#include <stdio.h> 
#include <pthread.h> 
#include <ctype.h> 
#include <stdlib.h> 
#include <time.h> 

#define N 2000 

typedef struct file_t { 
    char *name; 
    int words; 
} file_t; 

double time_diff(struct timespec *, struct timespec *); 
void *count_words(void *); 

// Usage: progname file1 file2 
int main(int argc, char *argv[]) { 
    pthread_t new_thread; 
    file_t file1, file2; 

    file1.name = argv[1]; 
    file1.words = 0; 
    file2.name= argv[2]; 
    file2.words = 0; 

    // Insert different part here 

    printf("Total words: %d\n", file1.words+file2.words); 
    return 0; 
} 

void *count_words(void *arg) { 
    FILE *fp; 
    file_t *file = (file_t *)arg; 
    int i, c, prevc = '\0'; 
    struct timespec process_beg, process_end; 
    struct timespec thread_beg, thread_end; 
    double process_diff, thread_diff; 

    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_beg); 
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &thread_beg); 

    fp = fopen(file->name, "r"); 
    for (i = 0; i < N; i++) { 
     while ((c = getc(fp)) != EOF) { 
      if (!isalnum(c) && isalnum(prevc)) 
       file->words++; 
      prevc = c; 
     } 
     fseek(fp, 0, SEEK_SET); 
    } 
    fclose(fp); 

    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_end); 
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &thread_end); 

    process_diff = time_diff(&process_beg, &process_end); 
    thread_diff = time_diff(&thread_beg, &thread_end); 
    printf("count_words() in %s takes %.3fs process time and" 
      "%.3fs thread time\n", file->name, process_diff, thread_diff); 
    return NULL; 
} 

double time_diff(struct timespec *beg, struct timespec *end) { 
    return ((double)end->tv_sec + (double)end->tv_nsec*1.0e-9) 
     - ((double)beg->tv_sec + (double)beg->tv_nsec*1.0e-9); 
}

注意

文件1中一個包含10000字「單詞」的文件。 file2是由cp命令創建的file1的副本。
爲了使執行時間足夠長，程序會重複計數單詞。 N是循環的數量。所以結果不是準確的總字數，而是乘以N.
請不要過分強調計數算法。我只關心這個例子中的執行時間。
重要信息：該機器是Intel®Celeron®CPU 420 @ 1.60GHz。一個核心。操作系統是Linux 3.2.0。也許一個核心是像其他人所說的這種奇怪現象的原因。但我仍然想弄明白。

該程序對字進行計數並使用clock_gettime（）來計算進程cpu時間和例程count_words（）的線程cpu時間，然後輸出時間和字數。以下是輸出和我對問題的評論。如果有人能夠解釋什麼是額外的時間，我將非常感激。

// version 1 
count_words() in file1 takes 2.563s process time and 2.563s thread time 
count_words() in file2 takes 8.374s process time and 8.374s thread time 
Total words: 40000000

評論：原始線程完成count_words（）並等待新線程死亡。當在新線程中運行count_words（）時，不會發生上下文切換（因爲進程時間==線程時間）。 爲什麼需要這麼多時間？ count_words（）在新線程中會發生什麼？

// version 2 
count_words() in file1 takes 16.755s process time and 8.377s thread time 
count_words() in file2 takes 16.753s process time and 8.380s thread time 
Total words: 40000000

點評：這裏有兩個線程並行運行。發生上下文切換，所以進程時間>線程時間。

// version 3 
count_words() in file2 takes 8.374s process time and 8.374s thread time 
count_words() in file1 takes 8.365s process time and 8.365s thread time 
Total words: 40000000

評論：新線程首先計數，原始線程等待它。新線程加入後，原始線程開始計數。 它們都沒有上下文切換，爲什麼要花費很多時間，特別是新線程加入後的計數？

// version 4 
count_words() in file1 takes 2.555s process time and 2.555s thread time 
count_words() in file2 takes 2.556s process time and 2.556s thread time 
Total words: 40000000

點評：最快的版本。沒有創建新的線程。兩個count_words（）都在一個線程中運行。

來源

2013-04-08 Yang Wenhao

引用版本1 + 2：虛假共享？也許這有助於：http://stackoverflow.com/q/8331255/694576 – alk 2013-04-08 11:55:40

@alk：我的機器有一個核心。正如你引用的鏈接中接受的答案所說，虛假分享是多核心的結果。單核可能發生嗎？ – 2013-04-09 03:29:06

這可能是因爲創建任何線程強制libc在getc中使用同步。這使這個功能顯着變慢。下面的例子是我作爲3版本慢：

void *skip(void *p){ return NULL; }; 
pthread_create(&new_thread, NULL, skip, NULL); 
count_words(&file1); 
count_words(&file2);

要解決這個問題，你可以使用一個緩衝：

for (i = 0; i < N; i++) { 
    char buffer[BUFSIZ]; 
    int read; 
    do { 
     read = fread(buffer, 1, BUFSIZ, fp); 
     int j; 
     for(j = 0; j < read; j++) { 
      if (!isalnum(buffer[j]) && isalnum(prevc)) 
       file->words++; 
      prevc = buffer[j]; 
     } 
    } while(read == BUFSIZ); 
    fseek(fp, 0, SEEK_SET); 
}

在此方案中，IO函數被調用很少足以使同步開銷微不足道。這不僅解決了奇怪定時的問題，而且使其速度提高了幾倍。對我而言，它是從0.54s（無線程）或0.85s（帶線程）減至0.15s（在這兩種情況下）。

來源

2013-04-08 12:06:59 zch

我將您的方法應用於每個版本，並且所有線程cpu和進程cpu時間都減少了。現在，count_words（）在每個條件下的線程CPU時間在1.20s和1.30s之間。使用getc（）中使用的同步是否意味着在每個I/O庫函數中使用鎖定以確保線程安全？我嘗試用getc_unlocked（）替換getc（），這將時間大大減少到大約1.50s。但仍然比你的方法慢一點，而且線程不安全。我還發現，在我的例子中，上下文切換不是一項耗時的工作。謝謝。 – 2013-04-09 06:39:50

我還想更深入。每次getc（）運行時都會發生文件鎖定。但是爲什麼在雙線程環境下比在單線程環境下需要更多時間（8.xxx vs 2.xxx）？我們還應該注意到，在版本1,2,3中很容易要求鎖。在版本3中，count_words（）在新線程連接後運行，只剩下一個線程。無法getc（）找到這個，並做快速鎖定（或別的東西）就像它在版本4呢？ – 2013-04-09 08:50:17

@ WenhaoYang，是的，在這種情況下，同步是鎖定的，在'man flockfile'中有簡短的描述。 '_unlocked'函數在這裏應該是安全的，因爲每個'FILE'結構都被一個線程使用。究竟是什麼導致它在創建任何線程之前運行得更快，我不知道。也許'如果（稱爲pthread_create）really_lock（）'在某種程度上？ – zch 2013-04-09 12:55:54

pthread程序中例程的額外執行時間是多少？

回答

相關問題