Russian language pack

Russian bundle for the Piko linguistics service. It supplies a Snowball stemmer, a Cyrillic phonetic encoder, and a stop-word list, all through a single blank import with no glue code.

Overview

The Russian pack registers three adapters under the language code russian. The stemmer is kljensen/snowball configured for Russian (for example reducing inflected forms to a common root). The phonetic encoder gives sounds-like matching for Cyrillic text. It works directly on Cyrillic input with no transliteration step, applying final-consonant devoicing and iotated-vowel handling. The encoder caps its output at six characters, so a code is at most six characters and often shorter. The stop-word provider holds 52 entries covering pronouns, prepositions, conjunctions, the "to be" verb forms, demonstratives, and particles.

Each adapter is a clean port implementation. The stemmer satisfies StemmerPort, the encoder satisfies PhoneticEncoderPort, and the provider satisfies StopWordsProviderPort. Compile-time assertions guard each one. All three resolve through a single WithLanguage("russian") lookup, so Russian slots into the same registries any consumer queries by name. The encoder length is configurable through NewWithMaxLength, and the auto-registered factory uses a default of six.

The package itself contains no exported types or constructors. It exists to register the three adapters through init() side effects in the per-feature sub-packages (linguistics_phonetic_russian, linguistics_stemmer_russian, linguistics_stopwords_russian). It is pure Go, so it needs no build tag or CGO and runs in interpreted dev-i mode.

Bootstrap

A blank import is enough. Each sub-package's init() registers itself with the linguistics domain registry. The import must be present in the final binary's import graph for any Russian component to resolve.

import (
    _ "piko.sh/piko/wdk/linguistics/linguistics_language_russian"
)

After import, select the language on an analyser through WithLanguage. This one call looks up the stemmer, phonetic encoder, and stop words from their registries.

import "piko.sh/piko/wdk/linguistics"

config := linguistics.DefaultConfigForLanguage("russian")
analyser := linguistics.NewAnalyser(config, linguistics.WithLanguage("russian"))

The cache_linguistics bridge wraps this in NewTextAnalyserForLanguage, which takes any language code and calls the same lookup. The pattern is identical across every language pack, so swapping russian for french is a one-import change.

See also

Other language packs:

Consumers:

Framework docs: