Spanish language pack

Spanish bundle for the Piko linguistics service. It supplies a Snowball stemmer, a Spanish phonetic encoder, and a stop-word list, all through a single blank import with no glue code.

Overview

The Spanish pack registers three adapters under the language code spanish. The stemmer uses the kljensen/snowball library's Spanish algorithm (for example hablamos to habl). The phonetic encoder applies Spanish-specific rules. Yeísmo collapses LL and Y to J, seseo collapses Z and soft C to S, and the B/V merger treats both letters as one sound. Words pronounced alike across Latin American and Castilian dialects produce the same code, which suits fuzzy sound-based matching. The encoder caps each code at 6 characters by default. The stop-word provider holds 57 entries covering articles, pronouns, prepositions, conjunctions, copulas, and demonstratives (for example el, la, los, las, de, y, que).

Each adapter is a clean port implementation. The stemmer satisfies StemmerPort, the encoder satisfies PhoneticEncoderPort, and the provider satisfies StopWordsProviderPort. Compile-time assertions guard each one. All three resolve through a single WithLanguage("spanish") lookup, so Spanish slots into the same registries any consumer queries by name. To override one component while keeping the others, pass WithStemmer, WithPhoneticEncoder, or WithStopWordsProvider.

The package itself contains no exported types or constructors. It exists to register the three adapters through init() side effects in the per-feature sub-packages (linguistics_phonetic_spanish, linguistics_stemmer_spanish, linguistics_stopwords_spanish). It is pure Go, so it needs no build tag or CGO and runs in interpreted dev-i mode.

Bootstrap

A blank import is enough. Each sub-package's init() registers itself with the linguistics domain registry. The import must be present in the final binary's import graph for any Spanish component to resolve.

import (
    _ "piko.sh/piko/wdk/linguistics/linguistics_language_spanish"
)

The import alone activates nothing. After import, select the language on an analyser through WithLanguage. This one call looks up the stemmer, phonetic encoder, and stop words from their registries.

import "piko.sh/piko/wdk/linguistics"

config := linguistics.DefaultConfigForLanguage("spanish")
analyser := linguistics.NewAnalyser(config, linguistics.WithLanguage("spanish"))

If the blank import is missing, WithLanguage falls back to no-op stemmer, encoder, and stop-words implementations. Spanish processing then turns off without an error or panic, so check the import first when an analyser returns unprocessed text. The cache_linguistics bridge wraps this lookup in NewTextAnalyserForLanguage, which takes any language code and runs the same path. The pattern is identical across every language pack, so swapping spanish for french is a one-import change.

See also

Other language packs:

Consumers: