English language pack
English bundle for the Piko linguistics service. It supplies a Snowball stemmer, a Double Metaphone phonetic encoder, and a stop-word list, all through a single blank import with no glue code.
Overview
The English pack registers three adapters under the language code english. The stemmer is kljensen/snowball Porter2 (for example running to run). The phonetic encoder is Double Metaphone for fuzzy sound-based matching. The encoder caps its primary code at four characters, so each code runs at most four characters and often shorter (for example Smith and Smyth both encode to SM0). The stop-word provider holds 176 entries covering articles, conjunctions, prepositions, pronouns, auxiliaries, quantifiers, and common adverbs.
Each adapter is a clean port implementation. The stemmer satisfies StemmerPort, the encoder satisfies PhoneticEncoderPort, and the provider satisfies StopWordsProviderPort. Compile-time assertions guard each one. All three resolve through a single WithLanguage("english") lookup, so English slots into the same registries any consumer queries by name.
The package itself contains no exported types or constructors. It exists to register the three adapters through init() side effects in the per-feature sub-packages (linguistics_phonetic_english, linguistics_stemmer_english, linguistics_stopwords_english). It is pure Go, so it needs no build tag or CGO and runs in interpreted dev-i mode.
The bigram analyser for English lives in a separate package, linguistics_bigrams_english. This language pack does not pull it in. To use bigram features, blank-import that package as well.
Bootstrap
A blank import is enough. Each sub-package's init() registers itself with the linguistics domain registry. The import must be present in the final binary's import graph for any English component to resolve.
import (
_ "piko.sh/piko/wdk/linguistics/linguistics_language_english"
)
After import, select the language on an analyser through WithLanguage. This one call looks up the stemmer, phonetic encoder, and stop words from their registries.
import "piko.sh/piko/wdk/linguistics"
config := linguistics.DefaultConfigForLanguage("english")
analyser := linguistics.NewAnalyser(config, linguistics.WithLanguage("english"))
The cache_linguistics bridge wraps this in NewTextAnalyserForLanguage, which takes any language code and calls the same lookup. The pattern is identical across every language pack, so swapping english for french is a one-import change.
See also
Other language packs:
Consumers:
- Cache linguistics bridge, wires a registered language into the cache search analyser.
Framework docs:
- Linguistics API reference, every type and function on the linguistics service.