Dutch language pack

Dutch bundle for Piko's linguistics service. Provides a Snowball-style stemmer, a Dutch phonetic encoder, and a stop-word list.

Overview

The Dutch pack registers three adapters under the language code dutch:

  • Stemmer. Wraps dchest/stemmer/dutch, the same dchest/stemmer family German uses. It does not use kljensen/snowball.
  • Phonetic encoder. Applies Dutch-specific digraph rules (ch, sch, ij, ui) and caps the output at six runes. The code is length-capped, not fixed-length, so short words produce short codes.
  • Stop-word provider. Returns 93 common Dutch words. The list covers articles (de, het, een), pronouns, prepositions, conjunctions, auxiliary and modal verbs (is, was, heeft, kan, zal, moet), demonstratives, and negations (niet, geen).

The package exports no types or constructors. It blank-imports linguistics_phonetic_dutch, linguistics_stemmer_dutch, and linguistics_stopwords_dutch so each sub-package init() registers its factory into the global linguistics registry. Every adapter is pure Go with no build tags and no CGO, so the pack runs identically in compiled builds and in interpreted dev (dev-i) mode.

Bootstrap

Enabling Dutch is a single blank import with no glue code:

import (
    _ "piko.sh/piko/wdk/linguistics/linguistics_language_dutch"
)

Import the pack before constructing any analyser. The three adapters implement the same linguistics_domain port contracts every language uses, so Dutch is interchangeable with English or German in any consumer. A cache search analyser selects the registered language by its string code:

analyser := cache_linguistics.NewTextAnalyserForLanguage("dutch")

The selector string must match linguistics.LanguageDutch ("dutch") exactly. Forgetting the import or mistyping the code does not raise an error. Instead, CreateStemmer falls back to a no-op stemmer, and the analyser drops stemming and phonetics without warning.

See also

Other language packs:

Consumers:

Framework docs: