Valgrind
Updated on 2022-05-07Citations
Valgrind Part 1 - Introduction by Paul Floyd
Valgrind runs the application under the test (AUT) in a virtual machine, VEX. VEX performs on the fly translation of the AUT machine code to an Intermediate Representation (IR), and intercepts system calls and memory accesses required for the analysis.
cachegrind simulates caches and counts cache misses/hits. callgrind counts CPU instructions executed.
Run callgrind
valgrind --tool=callgrind --collect-atstart=no program-to-run program-arguments
Run the annotator
callgrind_annotate --show=DLmr --sort=DLmr --auto=yes callgrind.out.pid
--auto=yes
breaks down the results per statement (instead of per function). --show=DLmr
only shows figures for DLmr
Enable cache simulation
valgrind --tool=callgrind --simulate-cache=yes program-to-run program-arguments
How to limit the range of collected events
Limiting the range of collected events
the collection state at program start can be switched off by –instr-atstart=no. During execution, it can be controlled programmatically with the macro CALLGRIND_TOGGLE_COLLECT;. Further, you can limit event collection to a specific function by using –toggle-collect=function.
I – instruction Ir
– instruction reads, #-instructions x #-frequency I1 – instruction L1 LLi – Last-Level instruction
D data D1 - data L1 cache LLd – last level cache data
Bc – conditional branches executed Bi – indirect branches executed
By default, the counts are exclusive— the counts for a function include only the time spent in that function and not in the functions that it calls.
--inclusive=yes
makes the counts inclusive.
L1 miss will typically cost around 5-10 cycles, an L2 miss can cost as much as 100-200 cycles,