Host
Intel Core Ultra 9 285K · 24 cores
Platform
linux/amd64
Go
go1.26.0
CPython
python:3.13-slim
PyPy
pypy:3.10-slim
Runs / combo
10 + 2 warmup

Per-line SHA-256 (FFI overhead)

Compute a 32-byte content digest of Frankenstein (Project Gutenberg #84, ~7,700 lines) by hashing each line with the runner's bundled SHA-256 primitive and XOR-folding into a running accumulator. Measures interpreter throughput + per-call native FFI overhead in equal measure.

Runtime · median per inner-loop window

median of 10 runs

Native Gocompiled
2.08 ms0.05×
Piko interpbytecode VM
39.3 msbaseline
CPython 3.13bytecode VM
57.3 ms1.46×
PyPy 7.3tracing JIT
31.4 ms0.80×
Ttengobytecode VM
87.7 ms2.23×
Sscriggobytecode VM
82.6 ms2.10×
Mmvmbytecode VM
124 ms3.16×
YyaegiAST walker
55.9 ms1.42×

Full statistics

RunnerNCompileRuntimeP95StddevRSSvs pikoStatus
Native Gocompiled10181 ms2.08 ms2.09 ms4.21 µs69 MiB0.05×OK
Piko interpbytecode VM101.71 ms39.3 ms39.8 ms261 µs117 MiB1.00×OK
CPython 3.13bytecode VM10276 µs57.3 ms58.2 ms1.07 msn/a1.46×OK
PyPy 7.3tracing JIT10233 µs31.4 ms32.7 ms666 µsn/a0.80×OK
tengobytecode VM10200 µs87.7 ms108 ms10.3 ms428 MiB2.23×OK
scriggobytecode VM10288 µs82.6 ms106 ms7.31 ms407 MiB2.10×OK
mvmbytecode VM10251 µs124 ms141 ms5.19 ms62 MiB3.16×OK
yaegiAST walker10286 µs55.9 ms74.1 ms8.26 ms66 MiB1.42×OK
Workload & symmetry rules

Workload

Walk Frankenstein's text (~441 KiB, ~7,700 lines) byte-by-byte to find newline boundaries. For each line, call the runner's bundled SHA-256 primitive once. XOR-fold every 32-byte digest into a running accumulator. Emit the accumulator as 64 lowercase hex characters.

Symmetry rules

  • One SHA-256 call per line (~7,700 native FFI dispatches per inner iteration). Whole-buffer hashing (sha256.Sum256(corpus) / hashlib.sha256(corpus).hexdigest()) is banned because it would do the work in a single native call, hiding the per-call signal.
  • Line scan and XOR fold run in interpreted code on every runner. No strings.Split, no str.splitlines.
  • Hand-rolled hex emission (no encoding/hex, no binascii.hexlify) so the trailing format step also runs in the interpreter.
  • Tengo's stdlib does not bundle SHA-256, so the harness registers a sha256_sum builtin that wraps Go's crypto/sha256.Sum256. Same shape of "register a host function as a native callout" that every other runner uses, just via Tengo's UserFunction surface.

Why this benchmark exists

This is the only benchmark that exercises the script-orchestrates-compiled-primitive pattern, which is how most production embedded interpreters are actually used (NumPy from Python, engine bindings from Lua, host functions from Tengo). Per-call FFI dispatch overhead is what discriminates runners: with ~7,700 calls per inner iteration, a 1µs-per-call difference adds up to ~8 ms of wall time, visible against a ~100 ms baseline. Every runner uses the same SHA-256 algorithm routed through the same compression primitive (OpenSSL on Python, Go's crypto/sha256 with ASM where available on Go family), so the per-block hash cost is comparable across the board.

Source code