Invert binary tree

65k-node balanced binary tree: allocator pressure, recursive descent, and pointer mutation in one inner loop. The only workload in the suite that stresses all three at once.

Runtime Compile time

Runtime · median per inner-loop window

median of 10 runs

Native Gocompiled

6.11 ms0.05×

Piko interpbytecode VM

116 msbaseline

CPython 3.13bytecode VM

91.2 ms0.79×

PyPy 7.3tracing JIT

39.8 ms0.34×

Ttengobytecode VM

189 ms1.62×

Sscriggobytecode VM

unsupported in scriggo

n/a

Mmvmbytecode VM

182 ms1.57×

YyaegiAST walker

744 ms6.41×

Full statistics

Runner	N	Compile	Runtime	P95	Stddev	RSS	vs piko	Status
Native Gocompiled	10	181 ms	6.11 ms	6.21 ms	52.5 µs	68 MiB	0.05×	OK
Piko interpbytecode VM	10	1.12 ms	116 ms	118 ms	927 µs	194 MiB	1.00×	OK
CPython 3.13bytecode VM	10	312 µs	91.2 ms	95.2 ms	1.92 ms	n/a	0.79×	OK
PyPy 7.3tracing JIT	10	265 µs	39.8 ms	40.4 ms	416 µs	n/a	0.34×	OK
tengobytecode VM	10	244 µs	189 ms	195 ms	4.08 ms	1.06 GiB	1.62×	OK
scriggobytecode VM	0	n/a	n/a	n/a	n/a	n/a	n/a	unsupported
mvmbytecode VM	10	358 µs	182 ms	204 ms	9.79 ms	67 MiB	1.57×	OK
yaegiAST walker	10	485 µs	744 ms	792 ms	20.3 ms	75 MiB	6.41×	OK

Workload & symmetry rules

Workload

Recursively construct a balanced binary tree of depth 15 (2¹⁶ − 1 = 65535 nodes). Each node's value is derived deterministically from its parent (value*2 for left, value*2+1 for right, 32-bit masked).
Count nodes via recursive traversal (sanity check).
Compute an in-order hash fold: state = state*31 + value mod 2³².
Invert the tree: at every node, swap left and right pointers recursively.
Re-count and re-hash. Emit the four observables pre_count,pre_fold,post_count,post_fold.

Symmetry rules

Real recursive *Node struct (Go) / Node class with __slots__ (Python).
No conversion to arrays for the inversion step, just a pointer swap at every node.

What makes this workload unique

Every other benchmark in the suite isolates one performance dimension. Bench 14 stresses pure float arithmetic in a tight loop. Bench 15 stresses interpreter throughput with no allocation. Bench 13 stresses interface dispatch with no mutation. This one combines three pressures into a single inner loop: 65k heap allocations during construction, four full recursive walks across the tree, and in-place pointer-field mutation during the inversion. The runner that wins this benchmark is the runner whose allocator, call-frame management, and pointer-write path are all simultaneously fast. The runner that loses it has a bottleneck in at least one of the three.

Source code

piko / Go
piko_source.go
native Go
native_main.go
CPython / PyPy
cpython.py
tengo
script.tengo