Provides algorithmic stemming for several languages,
some with additional variants. For a list of supported languages, see the
language
parameter.
When not customized, the filter uses the porter stemming algorithm for English.
The following analyze API request uses the stemmer
filter’s default porter
stemming algorithm to stem the foxes jumping quickly
to the fox jump
quickli
:
resp = client.indices.analyze( tokenizer="standard", filter=[ "stemmer" ], text="the foxes jumping quickly", ) print(resp)
response = client.indices.analyze( body: { tokenizer: 'standard', filter: [ 'stemmer' ], text: 'the foxes jumping quickly' } ) puts response
const response = await client.indices.analyze({ tokenizer: "standard", filter: ["stemmer"], text: "the foxes jumping quickly", }); console.log(response);
GET /_analyze { "tokenizer": "standard", "filter": [ "stemmer" ], "text": "the foxes jumping quickly" }
The filter produces the following tokens:
[ the, fox, jump, quickli ]
The following create index API request uses the
stemmer
filter to configure a new custom
analyzer.
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "stemmer" ] } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_analyzer: { tokenizer: 'whitespace', filter: [ 'stemmer' ] } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_analyzer: { tokenizer: "whitespace", filter: ["stemmer"], }, }, }, }, }); console.log(response);
PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "stemmer" ] } } } } }
-
language
-
(Optional, string) Language-dependent stemming algorithm used to stem tokens. If both this and the
name
parameter are specified, thelanguage
parameter argument is used.Valid values for
language
Valid values are sorted by language. Defaults to
english
. Recommended algorithms are bolded.- Arabic
-
arabic
- Armenian
-
armenian
- Basque
-
basque
- Bengali
-
bengali
- Brazilian Portuguese
-
brazilian
- Bulgarian
-
bulgarian
- Catalan
-
catalan
- Czech
-
czech
- Danish
-
danish
- Dutch
-
dutch
,dutch_kp
[8.16.0] Deprecated in 8.16.0.dutch_kp
will be removed in a future version - English
-
english
,light_english
,lovins
[8.16.0] Deprecated in 8.16.0.lovins
will be removed in a future version ,minimal_english
,porter2
,possessive_english
- Estonian
-
estonian
- Finnish
-
finnish
,light_finnish
- French
-
light_french
,french
,minimal_french
- Galician
-
galician
,minimal_galician
(Plural step only) - German
-
light_german
,german
,german2
,minimal_german
- Greek
-
greek
- Hindi
-
hindi
- Hungarian
-
hungarian
,light_hungarian
- Indonesian
-
indonesian
- Irish
-
irish
- Italian
-
light_italian
,italian
- Kurdish (Sorani)
-
sorani
- Latvian
-
latvian
- Lithuanian
-
lithuanian
- Norwegian (Bokmål)
-
norwegian
,light_norwegian
,minimal_norwegian
- Norwegian (Nynorsk)
-
light_nynorsk
,minimal_nynorsk
- Persian
-
persian
- Portuguese
-
light_portuguese
,minimal_portuguese
,portuguese
,portuguese_rslp
- Romanian
-
romanian
- Russian
-
russian
,light_russian
- Serbian
-
serbian
- Spanish
-
light_spanish
,spanish
spanish_plural
- Swedish
-
swedish
,light_swedish
- Turkish
-
turkish
-
name
-
An alias for the
language
parameter. If both this and thelanguage
parameter are specified, thelanguage
parameter argument is used.
To customize the stemmer
filter, duplicate it to create the basis for a new
custom token filter. You can modify the filter using its configurable
parameters.
For example, the following request creates a custom stemmer
filter that stems
words using the light_german
algorithm:
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "filter": [ "lowercase", "my_stemmer" ] } }, "filter": { "my_stemmer": { "type": "stemmer", "language": "light_german" } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_analyzer: { tokenizer: 'standard', filter: [ 'lowercase', 'my_stemmer' ] } }, filter: { my_stemmer: { type: 'stemmer', language: 'light_german' } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_analyzer: { tokenizer: "standard", filter: ["lowercase", "my_stemmer"], }, }, filter: { my_stemmer: { type: "stemmer", language: "light_german", }, }, }, }, }); console.log(response);
PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "filter": [ "lowercase", "my_stemmer" ] } }, "filter": { "my_stemmer": { "type": "stemmer", "language": "light_german" } } } } }