Elasticsearch for Beginners
Elasticsearch is one of the most famous NoSQL database systems today. Released in 2010, Elasticsearch is a modern search and analytics engine. It is open source, built with java, and based on Apache Lucene. It is the core of today’s most popular analytics system (ELK Stack). Its search capabilities provide users with one of the most efficient search functions to the users.
This article is for someone who is new to Elasticsearch and understand the core of it.To make this article simpler, we are going to compare ES’s basic queries with MySQL queries to make it easier to understand.
In this article, we’ll cover the following :
- Understand key terms in ElasticSearch.
- Response of an elastic search.
- ES queries analogous queries to MySQL.
Before digging deeper let’s under some key terminologies which are used when discussing elasticsearch.
- Index (Tables)— Analogous to tables in MySQL, we call individual units that contain the related data in ES as index. Certain naming conventions need to be followed while creating an index in elasticsearch.
1) It must not contain these special characters #, \, /, *, ?, ", <, >, |, ,
2) It must not start with _, - or +
3) It must not be . or ..
4) It must be in lowercase
- Document (Rows)— Analogous to rows in MySQL, it is a collection of fields in a JSON format.
“In Elasticsearch the textual data is represented with two data types: text and keyword. The text type is meant to be used for full-text search use cases while the keyword is meant for filtering, sorting, and aggregation”
- Analyzer — An analyzer is used to tell elasticsearch how the text should be indexed and searched and is used for text analysis when indexing or searching a text field
- Normalizer — It is a collection of filters. It can only be applied to the keyword datatype. Whenever we perform aggregation or filter, the default “lowercase normalizer”, converts every key in it into lowercase.It’s effect is seen when we perform any aggregation query in ES.
Understanding Elasticsearch Response :
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 62,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "es_test",
"_type" : "_doc",
"_id" : "id",
"_score" : 1.0,
"_source" : {
"<field_name1>": <field_value1>
"<field_name2>": <field_value2>
},
"sort" : [
"<field_value1>"
] }
]
},
"aggregations" : {
"<fieldName>" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "<field_value1>",
"doc_count" : 1
},
{
"key" : "<field_value1>",
"doc_count" : 2
}
]
}}
A typical response while querying in elastic search seems like this. For a beginner, it may become cumbersome in the beginning to understand it. But let’s break it down to understand it better.
- value — The total count of the found documents matching any query. Unlike SQL or Mongo, you don’t need to query additionally for the count.
- hits — The hits contain the matched documents in it.
- sort — sort contains specified document fields on which we have performed the sorting. We’ll see more about it later in this article.
- aggregation — When we do any grouping operation in elasticsearch, we get an additional object with the response. This contains unique values of a particular field, with their distinct counts, associated with “doc_count”.
Now, that we know the basics of Elasticsearch, let’s try to see some basic queries which we can run in elastic search.
- Creating an index: In ES, an index can be created by giving certain settings, which have shards, replicas, or any other top level configuration of the index, and mappings, which contains details more specific to fields that we want and how we want. Every mapping has default analyzers and normalizers to present data in a particular form. You can configure other than defaults during the index creation itself, as it’s not possible to make certain changes in index configuration, after index creation.
PUT /es_test
{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
},
"mappings" : {
"properties" : {
"updated_at" : { "type" : "date" }
}
}
}orPUT /es_test (if no specifications are required)
2. SELECT * from database : Let’s write equivalent query in ES.
GET es_test/_search
{
"query": {
"match_all": {}
}
}
3. SELECT * from database where condition1 : This applies to only one condition support.
GET es_test/_search
{
"query": {
"match": {
"<field_name>": <field_value>
}
}
}
4. SELECT * from database where condition1, condition2 … : We construct a Bool Query to deal with such scenarios. A bool query uses must and should to filter out the values from the index.
GET es_test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"<field_name>": <field_value>
}
},
{
"match": {
"<field_name>": <field_value>
}
}
]
}
}
}
5. SELECT column1, column2 from database : Let’s see how to get only certain columns in our hits.
GET es_test/_search
{
"_source": ["column1", "column2"]
}
6. To match a particular text or sentence in the database : We can specify the fields with wildcards.
GET es_test/_search
{
"query": {
"multi_match": {
"query":"hello",
"fields": ["key1", "key2*"] //fields to search in
}
}
}
7. Sort : We use the following format to sort our data based on a particular field. You can also specify mode for numeric fields to get the result sorted on avg, sum, min, max or median.
GET es_test/_search
{
"sort": [
{
"<field_name1>": {
"order": "desc",
"mode" : "avg"
}
},
{
"<field_name2>": {
"order": "asc" }
}
]
}
8. Aggregation or SELECT * DISTINCT: In ES, we use aggregation to group the data according to our wish. Let’s assume we have a list of all the athletes participating in a particular game in the Olympics, and we want to find the distinct countries participating in a sport. (GROUP BY in SQL). One thing to notice here is that when we aggregate based on a field, it gives all the distinct values of a field.
GET es_test/_search
{
"aggs": {
"country": {
"terms": {
"field": "country"
}
}
}
}
If we perform the above query, we get the grouped responses under the “aggregations” key of the response that we get.
9. DELETE a Document based on condition :
POST /es_test/_delete_by_query
{
"query": {
"match": {
"<field>": <value>
}
}
}
10. DELETE INDEX :
DELETE /es_test
11. WILDCARD SEARCH:
In all the above examples, where we were searching and matching any term we needed to put the exact value of the key we wanted. Let’s see how to do wildcard searches in ES.
GET es_test/_search
{
"query": {
"wildcard": {
"<field_value>": {
"value": "ab?c*"
}
}
}
}
12. FUZZY SEARCH :
Have you ever noticed that even if you have a typo in your web search, Google still suggests the correct web space? This is because of the fuzziness of search that it can handle. In elastic search also, we have a way of doing a fuzzy search. For an in-depth understanding, you can refer to various resources out there.
Let’s assume we have a document out there with “Series” as a key, and we want to find the documents where the series is friends, but look, we made a typo. But using this search, we still get the right documents in the hits. This is the capability of fuzzy search in ES.
GET es_test/_search
{
"query": {
"match": {
"series": {
"query": "frepnds",
"fuzziness": "AUTO"
}
}
}
}
These were some of the foundational concepts and queries to understand in Elasticsearch for a beginner. As I mentioned in the beginning, this article is for beginners, there are a lot of things we can go ahead and explore on our own. Happy reading : )