Elasticsearch Mapping: The Basics, Two Types, and a Few Examples
In this post we take a deep dive into Elasticsearch, including the basics as well as some different field types, replete with examples to help get you going with both static and dynamic mappings!
Join the DZone community and get the full member experience.
Join For Free
within a search engine, mapping defines how a document is indexed and how its fields are indexed and stored. we can compare mapping to a database schema in how it describes the fields and properties that documents hold, the datatype of each field (e.g., string, integer, or date), and how those fields should be indexed and stored by lucene. it is very important to define the mapping after we create an index — an inappropriate preliminary definition and mapping may result in the wrong search results.
in a previous article, we ran an elaborate comparison of the two search engine market leaders, elasticsearch and apache solr . here, we will delve deep into the elasticsearch mappings using a stable elasticsearch v2.4 configuration. we will discuss the basics, the different field types, and then give examples for both static and dynamic mapping.
about mapping
mapping is intended to define the structure and field types as required based on the answers to certain questions. for example:
- which string fields should be full text and which should be numbers or dates (and in which formats)?
- when should you use the _all field, which concatenates multiple fields to a single string and helps with analyzing and indexing?
- what custom rules should be set to update new field types automatically as they are added (e.g., the dynamic mapping type, which we will discuss further later on)?
each index has one or more mapping types that are used to divide documents into logical groups. basically, a type in elasticsearch represents a class of similar documents and it has a name such as “customer” or “item.” lucene has no concept of document data types, so the type name of each document is stored in a metadata field of a document called _type . when we search for documents within a particular type, elasticsearch simply uses a filter on the _type field to restrict the search.
in addition, mappings is the layer that elasticsearch uses to map complex json documents into the simple flat documents that lucene expects to receive. each mapping type has fields or properties defined by meta-fields and various data types.
data-type fields
when we create mapping, each mapping type will be a combination of multiple fields or lists with various types. for example, a “user” type may contain fields for title, first name, last name, and gender whereas an “address” type might contain fields for city, state, and zip code.
elasticsearch supports a number of different data types for the fields in a document:
core data types
string, date, numeric (long, integer, short, byte, double, and float), boolean, binary.
complex data types
-
array : array support does not require a dedicated type.
-
object : object for single json objects.
-
nested : nested for arrays of json objects.
geo data types
-
geo-point : geo_point for latitude/longitude points.
-
geo-shape : geo_shape for complex shapes such as polygons.
specialized data types
-
ipv4 : ip for ipv4 addresses.
-
completion : completion to provide autocomplete suggestions.
-
token count : token_count to count the number of tokens in a string.
-
attachment : mapper-attachments plugin which supports indexing attachments in formats such as microsoft office, open document, epub, and html, into an attachment datatype.
note: in versions 2.0 to 2.3, dots are not permitted in field names. elasticsearch 2.4.0 adds a system property called mapper.allow_dots_in_name that disables the check for dots in field names.
meta fields
meta fields are used to customize how a document’s associated metadata is treated. each document has associated metadata such as the _index , mapping _type , and _id meta-fields. the behavior of some of these meta fields can be customized when a mapping type is created.
identity meta fields
-
_index : the index to which the document belongs.
-
_uid : a composite field consisting of the _type and the _id.
-
_type : the document’s mapping type.
-
_id : the document’s id.
document source meta fields
-
_source : the original json representing the body of the document.
-
_size :the size of the _source field in bytes, provided by the mapper-size plugin.
indexing meta-fields
-
_all : a catch-all field that indexes the values of all other fields.
-
_field_names : all fields in the document that contain non-null values.
-
_timestamp : a timestamp associated with the document, either specified manually or auto-generated.
-
_ttl : how long a document should live before it is automatically deleted.
routing meta fields
-
_parent : used to create a parent-child relationship between two mapping types.
-
_routing : a custom routing value that routes a document to a particular shard.
other meta field
-
_meta : application-specific metadata.
an example
to create a mapping, you will need the put mapping api that will help you to set a specific mapping definition for a specific type, or you can add multiple mappings when you create an index .
an example of mapping creation using the mapping api:
put 'server_url/index_name/_mapping/mapping_name'
{
"type_1" : {
"properties" : {
"field1" : {"type" : "string"}
}
}
}
- index_name: provides the index name to be created
- mapping_name: provides the mapping name
- type_1 : defines the mapping type
- properties: defines the various properties and document fields
- {“type”}: defines the data type of the property or field
below is an example of mapping creation using an index api:
put /index_name
{
"mappings":{
"type_1":{
"_all" : {"enabled" : true},
"properties":{
"field_1":{ "type":"string"},
"field_2":{ "type":"long"}
}
},
"type_2":{
"properties":{
"field_3":{ "type":"string"},
"field_4":{ "type":"date"}
}
}
}
}
in the above code:
- index_name : the name of the index to be created.
- type_1: defines the mapping type.
- _all : the configuration metafield parameter. if “true,” it will concatenate all strings and search values.
- properties : defines the various properties and document fields.
- {“type”} : defines the data type of the property or field.
two mapping types
elasticsearch supports two types of mappings: “static mapping” and “dynamic mapping.” we use static mapping to define the index and data types. however, we still need ongoing flexibility so that documents can store extra attributes. to handle such cases, elasticsearch comes with the dynamic mapping option that was mentioned at the beginning of this article.
static mapping
in a normal scenario, we know well in advance which kind of data will be stored in the document, so we can easily define the fields and their types when creating the index. below is an example in which we are going to index employee data into an index named “company” under the type “employeeinfo.”
sample document data:
{
"name" : {"first" :"alice","last":"john"},
"age" : 26,
"joiningdate" : "2015-10-15"
}
example:
put /company
{
"mappings":{
"employeeinfo":{
"_all" : {"enabled" : true},
"properties":{
"name":{
"type":"object",
"properties":{
"field_1":{
"type":"string"
},
"field_2":{
"type":"string"
}
}
},
"age":{
"type":"long"
},
"joiningdate":{
"type":"date"
}
}
}
}
}
in the above api:
- employeeinfo : defines the mapping type name.
- _all : the configuration metafield parameter. if “true,” it will concatenate all strings and search values.
- properties : defines various properties and document fields.
- { “type”} : defines the data type of the property or field.
dynamic mapping
thanks to dynamic mapping , when you just index the document, you do not always need to configure the field names and types. instead, these will be added automatically by elasticsearch using any predefined custom rules have been defined. new fields can be added both to the top-level mapping type and to inner objects and nested fields. in addition, dynamic mapping rules can be configured to customize the existing mapping.
custom rules help to identify the right data types for unknown fields, such as mapping true/false in json to boolean, while integer in json maps to long in elasticsearch. rules can be configured using dynamic field mapping or a dynamic template . when elasticsearch encounters an unknown field in a document, it uses dynamic mapping to determine the data type of the field and automatically adds the new field to the type mapping.
however, there will be cases when this will not be your preferred option. perhaps you do not know what fields will be added to your documents later, but you do want them to be indexed automatically. perhaps you just want to ignore them. or, especially if you are using elasticsearch as a primary data store, maybe you want unknown fields to have an exception to alert you of the problem. fortunately, you can control this behavior with the dynamic setting , which accepts the following options:
- true : add new fields dynamically — this is the default.
- false : ignore new fields.
- strict : throw an exception if an unknown field is encountered.
example:
put /index_name
{
"mappings": {
"my_type": {
"dynamic": "strict",
"properties": {
"title": { "type":"string"},
"stash": {
"type": "object",
"dynamic": true
}
}
}
}
}
in the above api:
- index_name : creates an index with this name
- my_type: defines the mapping type name
- “dynamic”: “strict”: – the “my_type” object will throw an exception if an unknown field is encountered
- “dynamic” : true: – the “stash” object will create new fields dynamically
- _all : the configuration metafield parameter. if “true,” it will concatenate all strings and search values
- properties : defines the various properties and document fields
- {“type”} : defines the data type of the property or field
with dynamic mapping, you can add new searchable fields into the stash object:
example:
put /my_index/my_type/1
{
"title": "this doc adds a new field",
"stash": { "new_field": "success!" }
}
but trying to do the same at the top level will fail:
put /my_index/my_type/1
{
"title": "this throws a strictdynamicmappingexception",
"new_field": "fail!"
}
what’s new in elasticsearch 5.0 for mapping?
elasticsearch 2.x had a “string” data type for full-text search and keyword identifiers. full-text search is basically used to discover relevant text in documents, while keyword identifiers are used for sorting, aggregating, and filtering the documents. in elasticsearch 2.x, we cannot explicitly tell the elasticsearch engine which fields are used for full-text search and which are used for sorting, aggregating, and filtering the documents.
elasticsearch 5.x — see our full post on the full elk stack 5.0 as well as our complete guide to the elk stack — comes with two new data types called “text” and “keyword,” replacing the “string” data type in the earlier version.
- “text” : full-text and relevancy search in documents
- “keyword” : exact-value search for sorting, aggregation and filtering documents
text fields support the full analysis chain while keyword fields will support only a limited analysis — just enough to normalize values with lower casing and similar transformations. keyword fields support document values for memory-friendly sorting and aggregations while text fields have field data disabled by default to prevent the loading of massive amounts of data into the memory by mistake.
note : the “string” field type continue to work during the 5.x series, but it will likely be removed in 6.0.
when to use “text” or “keyword” data type
ending this article with a practical tip, here is a rule of thumb for mapping in elasticsearch:
- “text” data types : use when you require full-text search for particular fields such as the bodies of e-mails or product descriptions
- “keyword” data types : use when you require an exact-value search, particularly when filtering (“find me all products where status is available”), sorting, or using aggregations. keyword fields are only searchable by their exact value. use keyword data types when you have fields like email addresses, hostnames, status codes, zip codes, or tags.
summary
mapping in elasticsearch can seem daunting at times, especially if you’re just starting out with elk. at logz.io, this is part of the service we provide our users. but if you’re using your own elasticsearch deployment, pay careful attention to the details. we hope this article will help you to understand the basics.
Published at DZone with permission of Daniel Berman, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments