c - Is GNU gprof buggy? -
c - Is GNU gprof buggy? -
i have c programme calls function pi_calcpiitem()
600000000 times through function pi_calcpiblock
. analyze time spent in functions used gnu gprof. result seems erroneous since calls attributed main()
instead. furthermore phone call graph not create sense:
each sample counts 0.01 seconds. % cumulative self self total time seconds seconds calls ts/call ts/call name 61.29 9.28 9.28 pi_calcpiitem 15.85 11.68 2.40 pi_calcpiblock 11.96 13.49 1.81 _mcount_private 9.45 14.92 1.43 __fentry__ 1.45 15.14 0.22 pow 0.00 15.14 0.00 600000000 0.00 0.00 main phone call graph granularity: each sample nail covers 4 byte(s) 0.07% of 15.14 seconds index % time self children called name <spontaneous> [1] 61.3 9.28 0.00 pi_calcpiitem [1] ----------------------------------------------- <spontaneous> [2] 15.9 2.40 0.00 pi_calcpiblock [2] 0.00 0.00 600000000/600000000 main [6] ----------------------------------------------- <spontaneous> [3] 12.0 1.81 0.00 _mcount_private [3] ----------------------------------------------- <spontaneous> [4] 9.4 1.43 0.00 __fentry__ [4] ----------------------------------------------- <spontaneous> [5] 1.5 0.22 0.00 pow [5] ----------------------------------------------- 6 main [6] 0.00 0.00 600000000/600000000 pi_calcpiblock [2] [6] 0.0 0.00 0.00 600000000+6 main [6] 6 main [6] -----------------------------------------------
is bug or have configure programme somehow?
and <spontaneous>
mean?
edit (more insight you)
the code calculation of pi:
#define pi_blocksize (100000000) #define pi_blockcount (6) #define pi_threshold (pi_blocksize * pi_blockcount) int32_t main(int32_t argc, char* argv[]) { double result; ( int32_t = 0; < pi_threshold; += pi_blocksize ) { pi_calcpiblock(&result, i, + pi_blocksize); } printf("pi = %f\n",result); homecoming 0; } static void pi_calcpiblock(double* result, int32_t start, int32_t end) { double piitem; ( int32_t = start; < end; ++i ) { pi_calcpiitem(&piitem, i); *result += piitem; } } static void pi_calcpiitem(double* piitem, int32_t index) { *piitem = 4.0 * (pow(-1.0,index) / (2.0 * index + 1.0)); }
and how got results (executed on windows help of cygwin):
> gcc -std=c99 -o pi *.c -pg -fno-inline-small-functions > ./pi.exe > gprof.exe pi.exe
try:
usingnoinline
, noclone
function attributes instead of -fno-inline-small-functions
by disassembling main
see -fno-inline-small-functions
doesn't stop inlining linking programme statically (-static
) you should initialize result
0.0
in main
this worked me on linux, x86-64:
#include <stdio.h> #include <stdint.h> #include <math.h> #define pi_blocksize (100000000) #define pi_blockcount (6) #define pi_threshold (pi_blocksize * pi_blockcount) static void pi_calcpiitem(double* piitem, int32_t index); static void pi_calcpiblock(double* result, int32_t start, int32_t end); int32_t main(int32_t argc, char* argv[]) { double result; result = 0.0; ( int32_t = 0; < pi_threshold; += pi_blocksize ) { pi_calcpiblock(&result, i, + pi_blocksize); } printf("pi = %f\n",result); homecoming 0; } __attribute__((noinline, noclone)) static void pi_calcpiblock(double* result, int32_t start, int32_t end) { double piitem; ( int32_t = start; < end; ++i ) { pi_calcpiitem(&piitem, i); *result += piitem; } } __attribute__((noinline, noclone)) static void pi_calcpiitem(double* piitem, int32_t index) { *piitem = 4.0 * (pow(-1.0,index) / (2.0 * index + 1.0)); }
building code
$ cc pi.c -o pi -os -wall -g3 -i. -std=c99 -pg -static -lm
output
$ ./pi && gprof ./pi pi = 3.141593 flat profile: each sample counts 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 85.61 22.55 22.55 __ieee754_pow_sse2 4.75 23.80 1.25 pow 4.14 24.89 1.09 600000000 1.82 1.82 pi_calcpiitem 2.54 25.56 0.67 __exp1 0.91 25.80 0.24 pi_calcpiblock 0.53 25.94 0.14 matherr 0.47 26.07 0.13 __lseek_nocancel 0.38 26.17 0.10 frame_dummy 0.34 26.26 0.09 __ieee754_exp_sse2 0.32 26.34 0.09 __profile_frequency 0.00 26.34 0.00 1 0.00 0.00 main phone call graph (explanation follows) granularity: each sample nail covers 2 byte(s) 0.04% of 26.34 seconds index % time self children called name <spontaneous> [1] 85.6 22.55 0.00 __ieee754_pow_sse2 [1] ----------------------------------------------- <spontaneous> [2] 5.0 0.24 1.09 pi_calcpiblock [2] 1.09 0.00 600000000/600000000 pi_calcpiitem [4] ----------------------------------------------- <spontaneous> [3] 4.7 1.25 0.00 pow [3] ----------------------------------------------- 1.09 0.00 600000000/600000000 pi_calcpiblock [2] [4] 4.1 1.09 0.00 600000000 pi_calcpiitem [4] ----------------------------------------------- <spontaneous> [5] 2.5 0.67 0.00 __exp1 [5] ----------------------------------------------- <spontaneous> [6] 0.5 0.14 0.00 matherr [6] ----------------------------------------------- <spontaneous> [7] 0.5 0.13 0.00 __lseek_nocancel [7] ----------------------------------------------- <spontaneous> [8] 0.4 0.10 0.00 frame_dummy [8] ----------------------------------------------- <spontaneous> [9] 0.3 0.09 0.00 __ieee754_exp_sse2 [9] ----------------------------------------------- <spontaneous> [10] 0.3 0.09 0.00 __profile_frequency [10] ----------------------------------------------- 0.00 0.00 1/1 __libc_start_main [827] [11] 0.0 0.00 0.00 1 main [11] -----------------------------------------------
comments
as expected, pow()
bottleneck. while pi
running, perf top
(sampling based scheme profiler) shows __ieee754_pow_sse2
taking 60%+ of cpu. changing pow(-1.0,index)
((i & 1) ? -1.0 : 1.0)
@mike dunlavey suggested makes code 4 times faster.
c profiling profiler gprof
Comments
Post a Comment