German language pack

German bundle for Piko's linguistics service. Provides a Snowball-style stemmer, a Cologne Phonetic encoder (Kölner Phonetik), and a stop-word list.

Overview

The German pack registers three adapters under the language code german. The stemmer wraps github.com/dchest/stemmer/german, the same library the Dutch pack uses. The phonetic encoder is the Cologne Phonetic algorithm (Koelner Phonetik), published by Hans Joachim Postel in 1969 and tuned for matching German names. It maps each letter to a digit with context rules. For example, C encodes hard or soft based on its neighbour, P before H becomes the F group, and X expands to two digits. The encoder operates on the ASCII letters A to Z and skips any other byte, so it expects umlauts pre-expanded to ae, oe, and ue. The encoder drops raw ä, ö, and ü. The output is a digit string capped at a default maximum length of 10, set through NewWithMaxLength. The stop-word list holds 138 entries covering articles (der, die, das, ein, eine), conjunctions, common prepositions, and auxiliaries.

The package exports no types or constructors. Three blank imports of the feature sub-packages (linguistics_phonetic_german, linguistics_stemmer_german, linguistics_stopwords_german) drive registration through init(). Each adapter implements a linguistics port (StemmerPort, PhoneticEncoderPort, StopWordsProviderPort), so German slots into the same analyser pipeline as every other language with no special casing.

The pack is pure Go. It carries no CGO, no build tags, and no system libraries beyond dchest/stemmer and golang.org/x/text. It runs the same in compiled and interpreted dev-i modes.

Bootstrap

The blank import runs the init() functions that register the three adapters. Pair it with WithLanguage("german") to apply the German stemmer, phonetic encoder, and stop words to an analyser.

import (
    "piko.sh/piko/wdk/linguistics"

    _ "piko.sh/piko/wdk/linguistics/linguistics_language_german"
)

config := linguistics.DefaultConfigForLanguage("german")
analyser := linguistics.NewAnalyser(config, linguistics.WithLanguage("german"))

WithLanguage looks up all three components from their registries by language code. The blank import alone registers the adapters but wires nothing into an analyser.

See also

Other language packs:

Consumers:

Framework docs: