Per-line SHA-256 (FFI overhead)

Compute a 32-byte content digest of Frankenstein (Project Gutenberg #84, ~7,700 lines) by hashing each line with the runner's bundled SHA-256 primitive and XOR-folding into a running accumulator. Measures interpreter throughput + per-call native FFI overhead in equal measure.

Runtime Compile time

Runtime · median per inner-loop window

median of 10 runs

Native Gocompiled

2.08 ms0.05×

Piko interpbytecode VM

39.3 msbaseline

CPython 3.13bytecode VM

57.3 ms1.46×

PyPy 7.3tracing JIT

31.4 ms0.80×

Ttengobytecode VM

87.7 ms2.23×

Sscriggobytecode VM

82.6 ms2.10×

Mmvmbytecode VM

124 ms3.16×

YyaegiAST walker

55.9 ms1.42×

Full statistics

Runner	N	Compile	Runtime	P95	Stddev	RSS	vs piko	Status
Native Gocompiled	10	181 ms	2.08 ms	2.09 ms	4.21 µs	69 MiB	0.05×	OK
Piko interpbytecode VM	10	1.71 ms	39.3 ms	39.8 ms	261 µs	117 MiB	1.00×	OK
CPython 3.13bytecode VM	10	276 µs	57.3 ms	58.2 ms	1.07 ms	n/a	1.46×	OK
PyPy 7.3tracing JIT	10	233 µs	31.4 ms	32.7 ms	666 µs	n/a	0.80×	OK
tengobytecode VM	10	200 µs	87.7 ms	108 ms	10.3 ms	428 MiB	2.23×	OK
scriggobytecode VM	10	288 µs	82.6 ms	106 ms	7.31 ms	407 MiB	2.10×	OK
mvmbytecode VM	10	251 µs	124 ms	141 ms	5.19 ms	62 MiB	3.16×	OK
yaegiAST walker	10	286 µs	55.9 ms	74.1 ms	8.26 ms	66 MiB	1.42×	OK

Workload & symmetry rules

Workload

Walk Frankenstein's text (~441 KiB, ~7,700 lines) byte-by-byte to find newline boundaries. For each line, call the runner's bundled SHA-256 primitive once. XOR-fold every 32-byte digest into a running accumulator. Emit the accumulator as 64 lowercase hex characters.

Symmetry rules

One SHA-256 call per line (~7,700 native FFI dispatches per inner iteration). Whole-buffer hashing (sha256.Sum256(corpus) / hashlib.sha256(corpus).hexdigest()) is banned because it would do the work in a single native call, hiding the per-call signal.
Line scan and XOR fold run in interpreted code on every runner. No strings.Split, no str.splitlines.
Hand-rolled hex emission (no encoding/hex, no binascii.hexlify) so the trailing format step also runs in the interpreter.
Tengo's stdlib does not bundle SHA-256, so the harness registers a sha256_sum builtin that wraps Go's crypto/sha256.Sum256. Same shape of "register a host function as a native callout" that every other runner uses, just via Tengo's UserFunction surface.

Why this benchmark exists

This is the only benchmark that exercises the script-orchestrates-compiled-primitive pattern, which is how most production embedded interpreters are actually used (NumPy from Python, engine bindings from Lua, host functions from Tengo). Per-call FFI dispatch overhead is what discriminates runners: with ~7,700 calls per inner iteration, a 1µs-per-call difference adds up to ~8 ms of wall time, visible against a ~100 ms baseline. Every runner uses the same SHA-256 algorithm routed through the same compression primitive (OpenSSL on Python, Go's crypto/sha256 with ASM where available on Go family), so the per-block hash cost is comparable across the board.

Source code

piko / Go
piko_source.go
native Go
native_main.go
CPython / PyPy
cpython.py
tengo
script.tengo