Cache linguistics bridge

Adapter that turns a Piko linguistics analyser configuration into a cache.TextAnalyseFunc so cache search indexes can stem, normalise, and stop word filter text using language-specific rules.

Overview

The cache search subsystem indexes text fields by tokenising input, normalising case and diacritics, applying optional stemming, and filtering stop words. Without a text analyser the provider falls back to its default tokenisation. A search for "running" does not find a value indexed as "run". This bridge plugs the Piko linguistics module into that pipeline. The same Snowball stemmers and stop word lists used elsewhere in the framework now drive cache search analysis.

Reach for this bridge whenever a cache Searchable schema includes free-text fields and you want recall on morphologically related forms (plurals, verb tenses, accents). It composes with whatever language packs you import. The bridge itself is language-agnostic, and the language packs supply the stemmer and stop word lists.

The exposed functions return a cache.TextAnalyseFunc (a func(text string) []string) that the cache search layer consumes when indexing or querying. The same function must analyse both indexed values and queries, so wire one analyser into the schema and let the provider use it for both. The returned function is safe for concurrent use across cache goroutines. It draws from an internal pool of analysers sized to the CPU count, so concurrent callers do not contend or lock.

Requirements

The language packs are pure Go. They need no build tag, no CGO, and no system libraries, so they compile into a static binary and run in interpreted dev mode.

If you omit the language pack import, the analyser still builds. The registry falls back to no-op stemming and no stop word filtering instead of returning an error. A forgotten side effect import yields a working analyser that silently skips stemming. Confirm you import the matching language pack when stemming does not take effect.

Configuration

import (
    _ "piko.sh/piko/wdk/linguistics/linguistics_language_english"

    "piko.sh/piko/wdk/cache/cache_linguistics"
)

// Convenience constructor for English in Smart mode.
analyse := cache_linguistics.NewEnglishTextAnalyser()

// Or pick a language by name (matching imported packs):
analyse = cache_linguistics.NewTextAnalyserForLanguage("french")

Both constructors use AnalysisModeSmart, which stems each token to its root form ("running" becomes "run"). For full control over the analyser, call NewTextAnalyser with an explicit linguistics.AnalyserConfig from the public wdk/linguistics package:

import (
    "piko.sh/piko/wdk/linguistics"

    "piko.sh/piko/wdk/cache/cache_linguistics"
)

config := linguistics.DefaultConfigForLanguage(linguistics.LanguageEnglish)
config.Mode = linguistics.AnalysisModeFast // normalise and stop word filter, skip stemming

analyse := cache_linguistics.NewTextAnalyser(
    config,
    linguistics.WithLanguage(linguistics.LanguageEnglish),
)

AnalysisModeSmart stems tokens. AnalysisModeFast and AnalysisModeBasic return normalised tokens without stemming. DefaultConfigForLanguage returns AnalysisModeFast, so set AnalysisModeSmart when you want stemming.

WithLanguage installs the stemmer and the language stop word provider from the shared registries. The stop word provider it installs replaces any StopWords set on the config. Configure a custom stop word set through a provider, not the config field, when you also pass WithLanguage.

Bootstrap

There is no piko.With* option. Wire the analyser into a cache namespace's search schema when you build the namespace. Build the schema with cache.NewSearchSchemaWithAnalyser, which attaches the analyse function and the text fields in one call:

import (
    "piko.sh/piko/wdk/cache"
    "piko.sh/piko/wdk/cache/cache_linguistics"
)

analyse := cache_linguistics.NewEnglishTextAnalyser()

schema := cache.NewSearchSchemaWithAnalyser(
    analyse,
    cache.TextField("Name"),
    cache.TextField("Description"),
)

builder, err := cache.NewCacheBuilder[string, User](service)
if err != nil {
    return err
}

userCache, err := builder.
    Provider("redis").
    Namespace("users").
    Searchable(schema).
    Build(ctx)

NewCacheBuilder returns the builder and an error, so assign both before chaining the fluent methods. The analyse function drops straight into the schema, because the bridge returns the exact cache.TextAnalyseFunc type that SearchSchema.TextAnalyser holds. One constructor call wires it in with no adapter glue.

See also

Linguistics language packs:

Cache providers that benefit from linguistic search:

Framework docs:

External: