如何使用谷歌perf工具

我剛開始使用谷歌性能工具（google-perftools和libgoogle-perftools4包在Ubuntu中），我發誓我谷歌搜索了一天左右，我沒有找到答案！問題是，我沒有得到所有我的函數與CPU分析的結果。這是我的代碼：如何使用谷歌perf工具

#include "gperftools/profiler.h" 
#include <iostream> 
#include <math.h> 
using namespace std; 
void bar() 
{ 
     int a,b,c,d,j,k; 
     a=0; 
     int z=0; 
     b = 1000; 
     while(z < b) 
     { 
       while (a < b) 
       { 
         d = sin(a); 
         c = cos(a); 
         j = tan(a); 
         k = tan(a); 
         k = d * c + j *k; 
         a++; 
       } 
       a = 0; 
       z++; 
     } 
} 
void foo() 
{ 
     cout << "hey " << endl; 
} 

int main() 
{ 
     ProfilerStart("/home/mohammad/gperf/dump.txt"); 

     int a = 1000; 
     while(a--){foo();} 
     bar(); 
     ProfilerFlush(); 
     ProfilerStop(); 
}

編譯爲g++ test.cc -lprofiler -o a.out

我這是怎麼運行的代碼：

CPUPROFILE=dump.txt ./a.out

我也試過這樣：

CPUPROFILE_FREQUENCY=10000 LD_PRELOAD=/usr/local/lib/libprofiler.so.0.3.0 CPUPROFILE=dump.txt ./a.out

這是我從google-pprof --text a.out dump.txt得到的：

Using local file ./a.out. 
Using local file ./dump.txt. 
Total: 22 samples 
8 36.4% 36.4%  8 36.4% 00d8cb04 
6 27.3% 63.6%  6 27.3% bar 
3 13.6% 77.3%  3 13.6% __cos (inline) 
2 9.1% 86.4%  2 9.1% 00d8cab4 
1 4.5% 90.9%  1 4.5% 00d8cab6 
1 4.5% 95.5%  1 4.5% 00d8cb06 
1 4.5% 100.0%  1 4.5% __write_nocancel 
0 0.0% 100.0%  3 13.6% __cos

但是沒有關於foo功能的信息！

我的系統信息：的Ubuntu 12.04 G ++ 4.6.3

這就是所有！

來源

2012-06-03 Mohammad

你說你嘗試過「CPUPROFILE_FREQUENC = 10000」，但是缺少Y，所以如果實際（並且被忽略），那麼你可能沒有足夠的樣本來顯示這個微小的foo。 – mgkrebbs

您正在查看CPU時間，而不是掛鐘時間。 'foo'使用的CPU時間量幾乎沒有。它所做的只是調用庫函數來完成I/O，並且CPU樣本不會在I/O期間發生。 –

TL; DR：foo是快速和小得到分析事件，再運行100次。頻率設置是輸入錯誤，並且pprof不會比CONFIG_HZ更頻繁（通常爲250）。最好切換到更現代的Linux perf profiler（tutorial from its authors,wikipedia）。

龍版本：

你foo功能太短暫而簡單 - 只需要調用兩個功能。編譯g++ test.cc -lprofiler -o test.s -S -g測試，與test.s過濾與c++filt方案，使C++名稱可讀：

foo(): 
.LFB972: 
     .loc 1 27 0 
     pushq %rbp 
     movq %rsp, %rbp 
     .loc 1 28 0 
     movl $.LC0, %esi 
     movl std::cout, %edi 
     call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) 
     movl std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&), %esi 
     movq %rax, %rdi 
     call std::basic_ostream<char, std::char_traits<char> >::operator<<(std::basic_ostream<char, std::char_traits<char> >& (*)(std::basic_ostream<char, std::char_traits<char> >&)) 
     .loc 1 29 0 
     popq %rbp 
     ret 
.LFE972: 
     .size foo(), .-foo()

所以，看到它的輪廓，你應該更多的時間運行foo，在主變int a = 1000;的東西大得多，比如10000或更好100000（就像我測試的那樣）。

另外，您也可以修正不正確的「CPUPROFILE_FREQUENC=10000」以更正CPUPROFILE_FREQUENCY（注意Y）。我應該說10000設置CPUPROFILE_FREQUENCY的設置過高，因爲它通常每秒只能生成1000或250個事件，具體取決於內核配置CONFIG_HZ（大多數3.x內核有250個，請檢查grep CONFIG_HZ= /boot/config*）。並且pprof中CPUPROFILE_FREQUENCY的默認設置爲100.

我在Ubuntu 14上用bash腳本測試了CPUPROFILE_FREQUENCY的不同值：100000,10000,1000,250。04

for a in 100000 100000 10000 10000 1000 1000 300 300 250 250; do 
    echo -n "$a "; 
    CPUPROFILE_FREQUENCY=$a CPUPROFILE=dump$a.txt ./test >/dev/null; 
done

而且結果都是一樣的120-140的事件和大約0.5秒每./test運行，所以cpuprofiler從谷歌，perftools不能做到每秒單線程多個事件，比CONFIG_HZ集在內核（我有250）。

100000 PROFILE: interrupts/evictions/bytes = 124/1/6584 
100000 PROFILE: interrupts/evictions/bytes = 134/0/7864 
10000 PROFILE: interrupts/evictions/bytes = 125/0/7488 
10000 PROFILE: interrupts/evictions/bytes = 123/0/6960 
1000 PROFILE: interrupts/evictions/bytes = 134/0/6264 
1000 PROFILE: interrupts/evictions/bytes = 125/2/7272 
300 PROFILE: interrupts/evictions/bytes = 137/2/7984 
300 PROFILE: interrupts/evictions/bytes = 126/0/7216 
250 PROFILE: interrupts/evictions/bytes = 123/3/6680 
250 PROFILE: interrupts/evictions/bytes = 137/2/7352

採用原裝A = 1000個foo和cout的功能運行太快，他們得到任何分析事件（即使是在250個事件/秒），在每次運行，所以你沒有foo，也沒有任何輸入/輸出功能。在少量運行中，__write_nocancel可能會得到採樣事件，然後foo和I/O函數形式將以零自身事件計數報告（某處不在最上方，因此使用--text選項pprof或google-pprof），並且非零子事件計數：

.... 
    1 0.9% 99.1%  1 0.9% __write_nocancel 
.... 
    0 0.0% 100.0%  1 0.9% _IO_new_file_overflow 
    0 0.0% 100.0%  1 0.9% _IO_new_file_write 
    0 0.0% 100.0%  1 0.9% __GI__IO_putc 
    0 0.0% 100.0%  1 0.9% foo 
    0 0.0% 100.0%  1 0.9% new_do_write 
    0 0.0% 100.0%  1 0.9% std::endl 
    0 0.0% 100.0%  1 0.9% std::ostream::put

隨着a=100000，foo是仍然過短，快速獲得自己的事件，但I/O功能得到了幾個。這是清單，我從長--text輸出grepped：

34 24.6% 24.6%  34 24.6% __write_nocancel 

    1 0.7% 95.7%  35 25.4% __GI__IO_fflush 
    1 0.7% 96.4%  1 0.7% __GI__IO_putc 
    1 0.7% 97.8%  2 1.4% std::operator<< 
    1 0.7% 98.6%  36 26.1% std::ostream::flush 
    1 0.7% 99.3%  2 1.4% std::ostream::put 
    0 0.0% 100.0%  34 24.6% _IO_new_file_sync 
    0 0.0% 100.0%  34 24.6% _IO_new_file_write 
    0 0.0% 100.0%  40 29.0% foo 

    0 0.0% 100.0%  34 24.6% new_do_write 

    0 0.0% 100.0%  2 1.4% std::endl

功能與零個自己的專櫃看到的只是感謝閱讀調用鏈（它知道誰調用它得到樣品的功能，如果不省略幀信息）的pprof能力。

我也可以推薦更現代，更有能力（包括軟件和硬件事件，高達5 kHz的頻率或更多;無論用戶空間和內核分析）和更好的支持探查，在Linux perf分析器（tutorial from its authors，wikipedia）。

有結果從perf與a=10000：

$ perf record ./test3 >/dev/null 
... skip some perf's spam about inaccessibility of kernel symbols 
... note the 3 kHz frequency here VVVV 
Lowering default frequency rate to 3250. 
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate. 
[ perf record: Woken up 1 times to write data ] 
[ perf record: Captured and wrote 0.078 MB perf.data (~3386 samples) ]

要查看perf.data輸出文件，我會（在默認情況下啓動，因爲perf report交互式輪廓瀏覽器）使用較少的文本報告：

$ perf report |less 
... skip some extra info about the machine, kernel, and perf starting command 
# Samples: 1K of event 'cycles' 
# Event count (approx.): 1155264208 
# Overhead Command Shared Object   Symbol 
    41.94% test3 libm-2.19.so   [.] __tan_sse2                                          
    16.95% test3 libm-2.19.so   [.] __sin_sse2  
    13.40% test3 libm-2.19.so   [.] __cos_sse2                                          
    4.93% test3 test3    [.] bar()                                           
    2.90% test3 libc-2.19.so   [.] __GI___libc_write  
.... 
    0.20% test3 test3    [.] foo()

或perf report -n | less查看原始事件（樣本）計數器：

# Overhead  Samples Command  Shared Object 
    41.94%   663 test3 libm-2.19.so   [.] __tan_sse2                                          
    16.95%   268 test3 libm-2.19.so   [.] __sin_sse2 
    13.40%   212 test3 libm-2.19.so   [.] __cos_sse2                                          
    4.93%   78 test3 test3    [.] bar()                                           
    2.90%   62 test3 libc-2.19.so   [.] __GI___libc_write  
.... 
    0.20%    4 test3 test3    [.] foo()

來源

2014-06-01 18:28:28 osgx

嘗試設置LD_PRELOAD只是

LD_PRELOAD =的/ usr /本地/ LIB/libprofiler.so

它looks like there are problems傳遞中不後綴結束。所以一個共享庫時。

來源

2013-05-14 17:31:55 volpato

如何使用谷歌perf工具

回答

相關問題