Host
Intel Core Ultra 9 285K · 24 cores
Platform
linux/amd64
Go
go1.26.0
CPython
python:3.13-slim
PyPy
pypy:3.10-slim
Runs / combo
10 + 2 warmup

Invert binary tree

65k-node balanced binary tree: allocator pressure, recursive descent, and pointer mutation in one inner loop. The only workload in the suite that stresses all three at once.

Runtime · median per inner-loop window

median of 10 runs

Native Gocompiled
6.11 ms0.05×
Piko interpbytecode VM
116 msbaseline
CPython 3.13bytecode VM
91.2 ms0.79×
PyPy 7.3tracing JIT
39.8 ms0.34×
Ttengobytecode VM
189 ms1.62×
Sscriggobytecode VM
n/a
Mmvmbytecode VM
182 ms1.57×
YyaegiAST walker
744 ms6.41×

Full statistics

RunnerNCompileRuntimeP95StddevRSSvs pikoStatus
Native Gocompiled10181 ms6.11 ms6.21 ms52.5 µs68 MiB0.05×OK
Piko interpbytecode VM101.12 ms116 ms118 ms927 µs194 MiB1.00×OK
CPython 3.13bytecode VM10312 µs91.2 ms95.2 ms1.92 msn/a0.79×OK
PyPy 7.3tracing JIT10265 µs39.8 ms40.4 ms416 µsn/a0.34×OK
tengobytecode VM10244 µs189 ms195 ms4.08 ms1.06 GiB1.62×OK
scriggobytecode VM0n/an/an/an/an/an/aunsupported
mvmbytecode VM10358 µs182 ms204 ms9.79 ms67 MiB1.57×OK
yaegiAST walker10485 µs744 ms792 ms20.3 ms75 MiB6.41×OK
Workload & symmetry rules

Workload

  1. Recursively construct a balanced binary tree of depth 15 (2¹⁶ − 1 = 65535 nodes). Each node's value is derived deterministically from its parent (value*2 for left, value*2+1 for right, 32-bit masked).
  2. Count nodes via recursive traversal (sanity check).
  3. Compute an in-order hash fold: state = state*31 + value mod 2³².
  4. Invert the tree: at every node, swap left and right pointers recursively.
  5. Re-count and re-hash. Emit the four observables pre_count,pre_fold,post_count,post_fold.

Symmetry rules

  • Real recursive *Node struct (Go) / Node class with __slots__ (Python).
  • No conversion to arrays for the inversion step, just a pointer swap at every node.

What makes this workload unique

Every other benchmark in the suite isolates one performance dimension. Bench 14 stresses pure float arithmetic in a tight loop. Bench 15 stresses interpreter throughput with no allocation. Bench 13 stresses interface dispatch with no mutation. This one combines three pressures into a single inner loop: 65k heap allocations during construction, four full recursive walks across the tree, and in-place pointer-field mutation during the inversion. The runner that wins this benchmark is the runner whose allocator, call-frame management, and pointer-write path are all simultaneously fast. The runner that loses it has a bottleneck in at least one of the three.

Source code