solr - fuzzy search in elasticsearch different than fuzziness match boolean -
solr - fuzzy search in elasticsearch different than fuzziness match boolean -
i'm trying figure out why next queries produce vastly different results. i'm told fuzzy query never thought per document found-fuzzy i'm trying utilize match query fuzziness parameter. produce extremely different results. i'm not sure what's best way of doing this.
my illustration film title containing 'batman'. user, however, types 'bat man' (with space). create sense fuzzy query should find batman. should find other variations spider man, that's ok guess. (not really, but...)
so fuzzy search returning more relevant results match 1 below. ideas?
--fuzzy:
{ "query":{ "bool":{ "should": [ { "fuzzy": { "title": { "value": "bat man", "boost": 4 } } } ], "minimum_number_should_match": 1 } } }
--match:
{ "query":{ "bool":{ "should": [ { "match": { "title": { "query": "bat man", "boost": 4 } } } ], "minimum_number_should_match": 1 } } }
edit
i'm adding examples of gets returned.
first, nil gets returned using match query, high fuzziness value added (fuzziness: 5)
but several 'batman' related titles using fuzzy query such 'batman' or 'batman returns'.
this gets stranger when multiple fuzzy searches on 'bat man' using fuzzy search... if search 'starring' field, in add-on title field, (starring contains lists of actors), 'jason bateman' title 'batman'.
{ "_index": "store24", "_type": "searchdata", "_id": "081227987909", "_score": 4.600759, "fields": { "title": [ "batman" ] } }, { "_index": "store24", "_type": "searchdata", "_id": "883929053353", "_score": 4.1418676, "fields": { "title": [ "batman forever" ] } }, { "_index": "store24", "_type": "searchdata", "_id": "883929331789", "_score": 3.5298011, "fields": { "title": [ "batman returns" ] } }
best far (still not great)
what i've found works best far combine both queries. seems redundant, can't yet create 1 work other. so, seems better:
"should": [ { "fuzzy": { "title": { "boost": 6.0, "min_similarity": 1.0, "value": "batman" } } }, { "match": { "title": { "query": "batman", "boost": 6.0 ,"fuzziness": 1 } } } ]
elastic search analyzes docs , converts them terms, searched (not docs themselves). key difference between 2 query types match query not analyze query text before sending query. consider illustration below:
the search of 'bat man' in fuzzy search first tokenize term, search. looks 'btmn,' might not turn same matches. illustration of how jason bateman showed because lastly name tokenized btmn or similar form.
more detailed info on analyzing of text fields when searching can read http://exploringelasticsearch.com/searching_data.html#sec-searching-analysis
when search performed on analyzed field, query analyzed, matching documents analyzed when added database. reducing words these short tokens normalizes text allowing fast efficient lookups. whether you’re searching "rollerblading" in form, internally we’re looking "rollerblad".
solr lucene elasticsearch
Comments
Post a Comment