French language pack

French bundle for Piko's linguistics service: Snowball stemmer, French phonetic encoder, and a curated stop-word list.

Overview

The French pack registers three adapters under the language code french. The stemmer is kljensen/snowball configured for the French Snowball algorithm. It is best-effort. When Snowball returns an error, the stemmer returns the original word unchanged. The phonetic encoder applies French-specific rules and emits a code capped at a maximum length, six runes by default. Short words yield short codes. The stop-word provider holds 161 entries covering articles (le, la, les, un, une), pronouns, common prepositions, conjunctions, and auxiliaries.

Each adapter implements a piko linguistics port and self-registers, so this pack drops into the same analysis pipeline as every other language pack through one shared language code.

The phonetic encoder and the stop-word set expect input that is already normalised to accent-free forms (for example etre, ete, tres). Strip diacritics before encoding French text, or matches degrade.

The package contains no exported types or constructors. It imports the three feature sub-packages (linguistics_phonetic_french, linguistics_stemmer_french, linguistics_stopwords_french) for their init() side effects.

The pack is pure Go. It needs no build tag, no CGO, and no system libraries, so it runs in the interpreted dev-i mode as well as a compiled binary.

Bootstrap

The pack is two steps. The blank import registers the three factories under the code french. Selecting the language on an analyser pulls those factories from the registry and activates them. The import alone is a no-op for the stemmer and phonetic encoder, so pair it with WithLanguage.

import (
    _ "piko.sh/piko/wdk/linguistics/linguistics_language_french"

    "piko.sh/piko/wdk/linguistics"
)

config := linguistics.DefaultConfigForLanguage("french")
analyser := linguistics.NewAnalyser(config, linguistics.WithLanguage("french"))

WithLanguage("french") looks up the stemmer, phonetic encoder, and stop-words provider for the code in a single call. If any component is not registered, the analyser falls back to a no-op for that stage.

See also

Other language packs:

Consumers:

Framework docs: