Show HN: Single-Header Profiler for C++17
github.comMorning HN.
I often found myself wondering "how much does this code segment take in terms of total runtime" and it's often quite annoying to figure out with optimizations enabled, especially when working on something new or testing someone else's implementation without the proper tooling set up. Wanted to have a single include lib that would allow us to write something like:
``` PROFILE("Loop 1") for (...) // some work ```
and have the next expression automatically record time & dump results to a table. Wrote a few macros to do exactly that a few months back, but they were primitive and basically unusable for recursive code.
Tried to come up with a more generic solution that would build a call graph for nested profiler-macros, handle threads and etc. but doing so in a naive way would be super slow since we'd need some kind of a recursive map of nodes with callsites as a keys.
Recently had a revelation that it is possible to use macro-generated thread_local's to associate callsites with integer IDs on the fly and with some effort call graph can be neatly encoded in a few contiguous arrays with all graph building & traversal logic reduced to simple checks and array lookups. Realized threading can be quite easily supported too in an almost lock-free fashion.
After a few days of effort ended up building what I believe is a very much usable single-header profiling lib. Couldn't find anything quite like it, so I'd like to present it here and hear some opinions on the product:
https://github.com/DmitriBogdanov/UTL/blob/master/docs/modul...
As a gamedev, I almost never need the total time spent in a function, rather I need to visualize the total time spent in a function for that frame. And then I scan the output for long frames and examine those hotspots one frame at a time. Would be nice to be able to use that workflow in this, but visualizing it would be much different.
Nice, I like the colored output tables. Started tinkering with a small profiling lib as well a while ago.
https://github.com/gurki/glimmer
It focuses on creating flamegraphs to view on e.g. https://www.speedscope.app/. I wanted to use std::stacktrace, but they are very costly to evaluate, even just lazily at exit. Eventually, I just tracked thread and call layer manually.
If I understand correctly, you're tracking your call stack manually as well using some graph structure on linear ids? Mind elaborating a bit on its functionality and performance? Also proper platform-independent function names were a pita. Any comments on how you addressed that?
Speed scope is awesome.
Ive been thinking about using speed scope as a reference to make a native viewer like that.
Sampling profilers (like perf) are just so much easier to use than source markup ones. Just feel like the tooling around perf is bad and that speedscope is part of the solution.
How does the compare to Microprofile?
https://github.com/jonasmr/microprofile
Btw, I recently worked with a library that had their own profiler which generated a Chrome trace file, so you could load it up in the Chrome dev tools to explore the call graph and timings in a fancy UI.
It seems like such a good idea and I wish more profiling frameworks tried to do that instead of building their own UI.
Haven't worked with it, but based on initial look it's a quite different thing that stands closer to a frame-based profiler like Tracy (https://github.com/wolfpld/tracy).
As far as differences go:
Microprofile:
utl::profiler:This looks great! I've been needing something like this for a while, for a project which is quite compute-heavy and uses lots of threads and recursion. I've been using valgrind to profile small test examples, but that's definitely the nuclear option since it slows down the execution so much. I'm going to try this out right away.
Also discussed on /r/cpp: https://www.reddit.com/r/cpp/comments/1jy6ver/utlprofiler_si...
Do you also have some tools or scripts to help annotate code?
One inconvenience with this library's approach is having to modify the code to add/remove instrumentation, compared to something like GNU gprof which has compiler support and doesn't require modifying the code.
I've though about this but had yet to come up with a simple approach, perhaps something like a python script hooked to GCC-XML can do the trick, will look into that in the future.
Great work! The colored, structured output is clean! Folks may also be interested in nanobench, which is also a single header C++ lib. It focuses on benchmarking blocks of code, though.