Invert binary tree
65k-node balanced binary tree: allocator pressure, recursive descent, and pointer mutation in one inner loop. The only workload in the suite that stresses all three at once.
Runtime · median per inner-loop window
Full statistics
| Runner | N | Compile | Runtime | P95 | Stddev | RSS | vs piko | Status |
|---|---|---|---|---|---|---|---|---|
| Native Gocompiled | 10 | 181 ms | 6.11 ms | 6.21 ms | 52.5 µs | 68 MiB | 0.05× | OK |
| Piko interpbytecode VM | 10 | 1.12 ms | 116 ms | 118 ms | 927 µs | 194 MiB | 1.00× | OK |
| CPython 3.13bytecode VM | 10 | 312 µs | 91.2 ms | 95.2 ms | 1.92 ms | n/a | 0.79× | OK |
| PyPy 7.3tracing JIT | 10 | 265 µs | 39.8 ms | 40.4 ms | 416 µs | n/a | 0.34× | OK |
| tengobytecode VM | 10 | 244 µs | 189 ms | 195 ms | 4.08 ms | 1.06 GiB | 1.62× | OK |
| scriggobytecode VM | 0 | n/a | n/a | n/a | n/a | n/a | n/a | unsupported |
| mvmbytecode VM | 10 | 358 µs | 182 ms | 204 ms | 9.79 ms | 67 MiB | 1.57× | OK |
| yaegiAST walker | 10 | 485 µs | 744 ms | 792 ms | 20.3 ms | 75 MiB | 6.41× | OK |
Workload & symmetry rules
Workload
- Recursively construct a balanced binary tree of depth 15 (2¹⁶ − 1 = 65535 nodes). Each node's value is derived deterministically from its parent (
value*2for left,value*2+1for right, 32-bit masked). - Count nodes via recursive traversal (sanity check).
- Compute an in-order hash fold:
state = state*31 + valuemod 2³². - Invert the tree: at every node, swap
leftandrightpointers recursively. - Re-count and re-hash. Emit the four observables
pre_count,pre_fold,post_count,post_fold.
Symmetry rules
- Real recursive
*Nodestruct (Go) /Nodeclass with__slots__(Python). - No conversion to arrays for the inversion step, just a pointer swap at every node.
What makes this workload unique
Every other benchmark in the suite isolates one performance dimension. Bench 14 stresses pure float arithmetic in a tight loop. Bench 15 stresses interpreter throughput with no allocation. Bench 13 stresses interface dispatch with no mutation. This one combines three pressures into a single inner loop: 65k heap allocations during construction, four full recursive walks across the tree, and in-place pointer-field mutation during the inversion. The runner that wins this benchmark is the runner whose allocator, call-frame management, and pointer-write path are all simultaneously fast. The runner that loses it has a bottleneck in at least one of the three.
Source code
piko / Go
piko_source.gonative Go
native_main.goCPython / PyPy
cpython.pytengo
script.tengo