How do I do a partial match in Elasticsearch?

I have a link like and I want to match "google" out of the link.

I have:

query: { bool : { must: { match: { text: 'google'} } }
}

But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?

4

8 Answers

The point is that the ElasticSearch regex you are using requires a full string match:

Lucene’s patterns are always anchored. The pattern provided must match the entire string.

Thus, to match any character (but a newline), you can use .* pattern:

match: { text: '.*google.*'} ^^ ^^

In ES6+, use regexp insted of match:

"query": { "regexp": { "text": ".*google.*"}
}

One more variation is for cases when your string can have newlines: match: { text: '(.|\n)*google(.|\n)*'}. This awful (.|\n)* is a must in ElasticSearch because this regex flavor does not allow any [\s\S] workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."

However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search:

{ "query": { "wildcard": { "text": { "value": "*google*", "boost": 1.0, "rewrite": "constant_score" } } }
} 

See Wildcard search for more details.

NOTE: The wildcard pattern also needs to match the whole input string, thus

  • google* finds all strings starting with google
  • *google* finds all strings containing google
  • *google finds all strings ending with google

Also, bear in mind the only pair of special characters in wildcard patterns:

?, which matches any single character
*, which can match zero or more characters, including an empty one
9

use wildcard query:

'{"query":{ "wildcard": { "text.keyword" : "*google*" }}}'
1

For both partial and full text matching ,the following worked

"query" : { "query_string" : { "query" : "*searchText*", "fields" : [ "fieldName" ] }

I can't find a breaking change disabling regular expressions in match, but match: { text: '.*google.*'} does not work on any of my Elasticsearch 6.2 clusters. Perhaps it is configurable?

Regexp works:

"query": { "regexp": { "text": ".*google.*"}
}

For partial matching you can either use prefix or match_phrase_prefix.

1

For a more generic solution you can look into using a different analyzer or defining your own. I am assuming you are using the standard analyzer which would split into the tokens "http" and "drive.google.com". This is why the search for just google isn't working because it is trying to compare it to the full "drive.google.com".

If instead you indexed your documents using the simple analyzer it would split it up into "http", "drive", "google", and "com". This will allow you to match anyone of those terms on their own.

using node.js client

tag_name is the field name, value is the incoming search value.

 const { body } = await elasticWrapper.client.search({ index: ElasticIndexs.Tags, body: { query: { wildcard: { tag_name: { value: `*${value}*`, boost: 1.0, rewrite: 'constant_score', }, }, }, }, });

You're looking for a wildcard search. According to the official documentation, it can be done as follows:

query_string: { query: `*${keyword}*`, fields: ["fieldOne", "fieldTwo"],
},

Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters: qu?ck bro*

Be careful, though:

Be aware that wildcard queries can use an enormous amount of memory and perform very badly — just think how many terms need to be queried to match the query string "a* b* c*".

Allowing a wildcard at the beginning of a word (eg "*ing") is particularly heavy, because all terms in the index need to be examined, just in case they match. Leading wildcards can be disabled by setting allow_leading_wildcard to false.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

You Might Also Like