Gzip storage transformer

Stream transformer that gzips byte streams on the way to the storage provider and decompresses them on the way back, with constant memory usage.

Overview

Gzip is the most universally supported compression format. Every language, tool, browser, and storage system reads it. The transformer compresses uploads through klauspost/compress/gzip, a faster drop-in replacement for the standard library, at a configurable level, and reverses on downloads. It streams with constant memory. The source reader feeds the compressor and the consumer pulls the compressed result without buffering the whole object.

The transformer implements StreamTransformerPort directly. One RegisterTransformer call plugs it into the storage provider chain, with no glue code. It composes with other transformers through an integer priority, so chaining compress-then-encrypt is declarative ordering, not a hand-written pipeline.

Levels follow the standard gzip semantics: gzip.NoCompression (0) through gzip.BestSpeed (1), gzip.DefaultCompression (-1, the default), and gzip.BestCompression (9). The Reverse path caps decompressed output to guard against decompression bombs on untrusted downloads. See Decompression cap.

Configuration

import (
    "github.com/klauspost/compress/gzip"
    "piko.sh/piko/wdk/storage/storage_transformer_gzip"
)

transformer, err := storage_transformer_gzip.NewGzipTransformer(storage_transformer_gzip.Config{
    Name:                 "gzip",                  // optional, default "gzip"
    Priority:             100,                     // optional, default 100
    Level:                gzip.DefaultCompression, // optional, default DefaultCompression
    MaxDecompressedBytes: 256 * 1024 * 1024,       // optional, default 256 MiB
})
if err != nil {
    return err
}

NewGzipTransformer returns an error when Level is outside the valid range. storage_transformer_gzip.DefaultConfig() returns the same defaults to start from and override one field.

Config fields:

  • Name. The transformer identifier in the registry. Defaults to gzip.
  • Priority. The position in the transform chain. Lower values run first on writes. Defaults to 100.
  • Level. The gzip compression level. A zero value maps to gzip.DefaultCompression.
  • MaxDecompressedBytes. The cap on bytes produced by Reverse. A zero value uses the 256 MiB default. A negative value disables the cap.

WithMaxDecompressedBytes(maxBytes int64) sets the same cap as a construction option:

transformer, err := storage_transformer_gzip.NewGzipTransformer(
    storage_transformer_gzip.DefaultConfig(),
    storage_transformer_gzip.WithMaxDecompressedBytes(64*1024*1024),
)

Bootstrap

Gzip registers against the storage service transformer registry, not through piko.With*. RegisterTransformer is the only supported wiring. There is no NewService option for transformers. Register after building the service:

if err := service.RegisterTransformer(ctx, transformer); err != nil {
    return err
}

Use a transformer on writes and reads

PutObject takes a provider name and a *storage.PutParams and returns only an error. Set TransformConfig.EnabledTransformers to the transformer names that apply to the call. The TransformerOptions map keys on the transformer name and overrides the level for that one operation:

params := &storage.PutParams{
    Key:    "report.json",
    Reader: r,
    TransformConfig: &storage.TransformConfig{
        EnabledTransformers: []string{"gzip"},
        TransformerOptions: map[string]any{
            "gzip": map[string]any{"level": 9},
        },
    },
}
if err := service.PutObject(ctx, providerName, params); err != nil {
    return err
}

GetObject mirrors the write path. Pass the same EnabledTransformers so the chain reverses the gzip step on the way back:

reader, err := service.GetObject(ctx, providerName, storage.GetParams{
    Key: "report.json",
    TransformConfig: &storage.TransformConfig{
        EnabledTransformers: []string{"gzip"},
    },
})
if err != nil {
    return err
}
defer func() { _ = reader.Close() }()

A custom ProviderPort exposes the same step through Put(ctx, *storage.PutParams), which also returns only an error.

Ordering with encryption

The transform chain sorts by Priority(). It runs ascending on writes and reverses for reads, so a lower priority compresses first and decompresses last. To chain compress-then-encrypt, gzip must run before the crypto transformer on writes.

The two transformers carry different priority numbers in two places. The gzip default is 100. The crypto package recommends 250 for encryption. The framework bootstrap, however, auto-registers the crypto transformer at priority 100, the same number as the gzip default. The chain sorts with slices.SortFunc, which does not preserve input order, so a tie between gzip at 100 and crypto at 100 leaves the relative order undefined. The default deployment does not guarantee compress-then-encrypt.

Set gzip to a priority below 100 so it always compresses before encryption and decompresses after decryption:

transformer, err := storage_transformer_gzip.NewGzipTransformer(storage_transformer_gzip.Config{
    Priority: 50,
})

Decompression cap

Reverse wraps the gzip reader so reads beyond MaxDecompressedBytes surface ErrDecompressedTooLarge instead of inflating without bound. The default cap is 256 MiB. This protects callers that buffer a download against a small upload that expands to gigabytes. Use errors.Is(err, storage_transformer_gzip.ErrDecompressedTooLarge) to distinguish the cap from a normal end of stream. Lower the cap for stricter limits on untrusted objects, or set a negative value to disable it for fully trusted input.

Observability

The package emits OpenTelemetry metrics under storage.transformer.gzip.*: operation duration, an operations counter, an errors counter, bytes processed, and a compression ratio histogram. They surface throughput and effectiveness without extra instrumentation.

Tradeoffs

Gzip is slower and produces larger output than zstd at every comparable setting. The reasons to pick gzip over zstd are interoperability, where third parties read your objects directly, and ecosystem maturity, where every audit tool already speaks gzip. Reach for the Zstandard transformer when you control both ends of the pipe and want a better ratio at similar speed. Reach for the crypto transformer when at-rest encryption matters more than size.

See also

Other storage transformers:

Storage providers:

Framework docs:

External: