Dense neural-network layer
Forward pass of a 256x256 dense (fully-connected) layer with ReLU activation. The inner dot-product loop hits piko's SIMD recogniser; every other runner runs it scalarly.
Runtime · median per inner-loop window
Full statistics
| Runner | N | Compile | Runtime | P95 | Stddev | RSS | vs piko | Status |
|---|---|---|---|---|---|---|---|---|
| Native Gocompiled | 10 | 182 ms | 146 µs | 149 µs | 1.31 µs | 68 MiB | 0.13× | OK |
| Piko interpbytecode VM | 10 | 1.05 ms | 1.10 ms | 1.13 ms | 8.50 µs | 101 MiB | 1.00× | OK |
| CPython 3.13bytecode VM | 10 | 356 µs | 22.6 ms | 23.6 ms | 493 µs | n/a | 20.5× | OK |
| PyPy 7.3tracing JIT | 10 | 301 µs | 5.72 ms | 5.99 ms | 135 µs | n/a | 5.19× | OK |
| tengobytecode VM | 10 | 291 µs | 34.4 ms | 38.3 ms | 3.73 ms | 361 MiB | 31.2× | OK |
| scriggobytecode VM | 10 | 325 µs | 13.2 ms | 14.3 ms | 400 µs | 82 MiB | 12.0× | OK |
| mvmbytecode VM | 10 | 319 µs | 24.2 ms | 36.4 ms | 5.06 ms | 66 MiB | 22.0× | OK |
| yaegiAST walker | 10 | 424 µs | 16.8 ms | 17.0 ms | 90.0 µs | 62 MiB | 15.3× | OK |
Workload & symmetry rules
Workload
For each of 256 output neurons: compute output[i] = relu(sum_j(W[i][j] * input[j]) + bias[i]) over a deterministically-seeded 256x256 weight matrix and 256-element input vector. Sum the activations, multiply by 1000, emit as a single integer so canonical hashing is FP-safe.
Symmetry rules
- Hand-rolled scalar dot-product loop in every runner. No matrix libraries, no numpy, no SIMD intrinsics in the source.
- The Go source is structured so piko's pattern recogniser sees the inner loop as a dot-product and emits one
subOpSimdDotProductFloat64per row. The same source runs scalarly on every other Go-family interpreter. - Python emits a Python
forloop over scalar floats; nonumpy.dot.
Why this benchmark exists
It is the headline number for piko's SIMD recogniser. Bench 14 (Mandelbrot) shows pure FP throughput; bench 22 (n-body) shows struct-in-FP-loop; this one specifically shows what the SIMD path is worth in a workload that maps to it perfectly. If the recogniser fires, piko closes much of the gap to native Go.
Source code
piko / Go
piko_source.gonative Go
native_main.goCPython / PyPy
cpython.pytengo
script.tengo