Norwegian language pack
Norwegian bundle for Piko's linguistics service: Snowball stemmer, Norwegian phonetic encoder, and a stop-word list.
Overview
The Norwegian pack registers three adapters under the language code norwegian. The stemmer wraps github.com/kljensen/snowball configured for Norwegian. The phonetic encoder applies Norwegian-specific rules, including the KJ and SJ palatal sounds, retroflex RS, and the standard vowel patterns. It maps each rule to a sound group, so KJ, SJ, and RS all encode to the X group. The encoder caps the output at a default maximum of six characters, set through NewWithMaxLength. Short words yield shorter codes, since the encoder caps length but does not pad. The stop-word list holds 107 entries covering articles (en, ei, et, den, det), pronouns and possessives, prepositions, conjunctions, auxiliary and modal verbs with their inflections (er, var, har, kan, skal, vil, må), demonstratives, negations, and quantifiers. The list spans both Bokmal and Nynorsk, so it covers either written standard.
The package exports no types or constructors. Three blank imports of the feature sub-packages (linguistics_phonetic_norwegian, linguistics_stemmer_norwegian, linguistics_stopwords_norwegian) drive registration through init(). Each adapter implements a linguistics port (StemmerPort, PhoneticEncoderPort, StopWordsProviderPort), checked at compile time, so Norwegian slots into the same analyser pipeline as every other language with no special casing.
The pack is pure Go. It carries no CGO, no build tags, and no system libraries beyond kljensen/snowball. It runs the same in compiled and interpreted dev-i modes.
Bootstrap
The blank import runs the init() functions that register the three adapters. Pair it with WithLanguage("norwegian") to apply the Norwegian stemmer, phonetic encoder, and stop words to an analyser.
import (
"piko.sh/piko/wdk/linguistics"
_ "piko.sh/piko/wdk/linguistics/linguistics_language_norwegian"
)
config := linguistics.DefaultConfigForLanguage("norwegian")
analyser := linguistics.NewAnalyser(config, linguistics.WithLanguage("norwegian"))
WithLanguage looks up all three components from their registries by language code. The blank import alone registers the adapters but wires nothing into an analyser. If a component is not registered, the analyser falls back to a no-op for that component. A forgotten blank import then degrades to no stemming instead of an error.
See also
Other language packs:
Consumers:
- Cache linguistics bridge, wires a registered language into the cache search analyser.
Framework docs:
- Linguistics API reference, every type and function on the linguistics service.