Few performance facts about bgpgrep
If you are a performance junkie like me, the first question that probably pops in your mind after a major code rewrite is something like:
Is it faster than before?
Let’s now satisfy this curiosity (of mine) with some benchmarking.
Benchmark environment
- Processor: Intel© Core™ i7-8565U at 1.80GHz (4 cores physical, 8 cores with hyperthreading)
- Cache layout:
- L1 data cache: 128 KiB (4 instances)
- L1 instructions cache: 128 KiB (4 instances)
- L2 cache: 1 MiB (4 instances)
- L3 cache: 8 MiB (1 instance)
- Memory: 16 GB RAM DDR4, in two 8GB banks
- Hard disk: SAMSUNG MZALQ512HALU-000L1
- Kernel: Linux 5.10.62-1-lts SMP x86_64 GNU/Linux
To avoid adultering results we also:
- disable
cron
and any other background file indexing service; - force performance CPU profile, disabling powersave mode;
- disable Linux address space layout randomization for the duration of our tests;
- increase kernel performance events sample rate;
- drop filesystem caches and clean any temporary file;
- run benchmarks in console mode, outside any desktop environment.
Both bgpscanner
and bgpgrep
have been compiled in release mode with full optimizations,
as documented in their official build instructions, using clang
version 12.0.1.
For reference, we also do a benchmark run with bgpdump
, version 1.6.2, as available from
Arch Linux User Repositories (AUR).
Results are calculated by averaging five runs of each command, immediately
after one warmup round. MRT data is decompressed upfront, to avoid accounting for
decompression overhead, the output is sent directly to /dev/null
,
to avoid any disk write overhead.
Let the fun begin!
We take the data for the first benchmark from RouteViews’ Sydney Route Collector, and pull the very first RIB of December 2020, along with any subsequent updates from the same month. This gives us 47.1GB uncompressed MRT data to work with.
We then run our benchmarks with the following commands:
bgpgrep sydney/2020-12/uncompressed.mrt >/dev/null
bgpscanner sydney/2020-12/uncompressed.mrt >/dev/null
bgpdump -mv sydney/2020-12/uncompressed.mrt >/dev/null
Average (sec) | Best (sec) | Worst (sec) | Memory (KiB) | |
---|---|---|---|---|
bgpgrep | 404.45 | 401.62 | 411.38 | 2076 |
bgpscanner | 453.59 | 451.93 | 455.13 | 2448 |
bgpdump | 2053.73 | 2037.19 | 2082.22 | 2316 |
bgpgrep
is 11% faster than bgpscanner
, which is good.
Since this benchmark operates mostly on MRT update dumps, let’s try the same
on a different dataset, mostly made of RIBs.
We pull nine RIBs from RIPE RIS NCC RRC00 Route Collector,
and obtain 25.7GB worth of uncompressed MRT data.
This time the benchmark is limited to bgpgrep
and bgpscanner
.
Executed commands and results:
bgpgrep rrc00/2019-12/rib-uncompressed.mrt >/dev/null
bgpscanner rrc00/2019-12/rib-uncompressed.mrt >/dev/null
Average (sec) | Best (sec) | Worst (sec) | Memory (KiB) | |
---|---|---|---|---|
bgpgrep | 295.84 | 292.20 | 298.14 | 2112 |
bgpscanner | 333.35 | 321.73 | 339.56 | 3016 |
The same trend is confirmed, bgpgrep
is about 12% faster, indicating that
the advantage was not data dependent.
Though, running our benchmarks under average system load might lead to an interesting surprise:
bgpgrep isolario/2021-07/rib-uncompressed.mrt >/dev/null
bgpscanner isolario/2021-07/rib-uncompressed.mrt >/dev/null
Average (sec) | Best (sec) | Worst (sec) | Memory (KiB) | |
---|---|---|---|---|
bgpgrep | 344.90 | 342.88 | 347.03 | 2260 |
bgpscanner | 411.39 | 405.13 | 412.70 | 2436 |
These runs have been performed under a regular GNOME desktop session, with other applications running. We used 60.8GB worth of MRT data from the Isolario project Dagobah Collector, from the month of July, 2021 (mostly RIBs). It might strike us that the performance gain now approaches 20%.
The reason might be a smarter use of memory, and the reduced chance of page faults.
You might have noticed by our results that bgpgrep
memory requirements are
moderate compared to bgpscanner
, what’s less evident is that bgpgrep
also keeps
its data structures compact and doesn’t like moving them around much.
This lessens the page pressure on the system (and makes the CPU cache happier).
The net effects of this aren’t evident in the benchmarking environment,
since bgpgrep
and bgpscanner
, in turns, are the only resource intensive
tasks on the system.
The initial warmup round contributes to their ideal performance.
When more tasks are concurrently fighting over memory, and processes might get
swapped to different cores for various reasons, invalidating their cache,
the value of bgpgrep
approach becomes more prominent.
Conclusion
bgpgrep
seems to be a nice improvement over bgpscanner
, and I am
quite satisfied with the performance improvements. Especially when they come with
a more solid codebase.
In the next few weeks I intend to improve the filtering engine. In general I’d like to stop for a bit to polish the codebase to make it more mature, before moving on to implement more features.
If you haven’t already, be sure to check out the Micro BGP Suite at our official Git Repository.
Like always, happy hacking to you all!
Lorenzo Cogotti