ElasticSearch: Parent and Child Joins — Game of Thrones Edition
ElasticSearch is not a relational database, it is all about search efficiency and not storage efficiency.
Join the DZone community and get the full member experience.
Join For FreeIn a relational database, a child table references the parent with a foreign key and this relationship is called a Join. The design typically involves normalizing the data.
ElasticSearch is not a relational database, it is all about search efficiency and not storage efficiency. The data stored is denormalized and is pretty much flat. What that means is joins cannot be across Indexes, ElasticSearch is all about speed and traditional joins would run too slow. So both the child and parent documents must be on the same Index and in the same Shard.
Example Parent/Child Relationship
Let’s consider two famous houses from the HBO series Game of Thrones (For those worried about spoilers, I have faked the isAlive status of the characters). The family tree depicted in Image 1 has four Parents and nine Children. Each character has a gender and an isAlive status.
Creating the “Game_Of_Thrones” Index
The code below helps create an index for the above relationship. (Setup guide for Elastic Search). Starting ElasticSearch 7, a type is no longer required for indexes, unlike previous versions.
Line 23: The relation_type
, is a name for the join.
Line 24: The type join
is a special field that creates parent/child relation within documents of the same index.
Line 25: Parent-child uses the Global Ordinals to speed up joins.
Line 26–28: The relations
section defines a set of possible relations within the documents, each relation being a parent name and a child name.
Inserting the Parent Data
Let’s walk through the code for one parent insert before running a script to insert the other parents depicted on Image 1.
Create Eddard Stark
The above code creates a new document for Eddard Start and marks it as a parent document using, the relation_type
field. A value parent is assigned to the name
of the relation. Along with the relations, it also adds fields needed like house, gender, and isAlive.
One key thing to notice here is the routing
query parameter. Each parent assigns its own name to the parameter. The routing field helps us control which shard the document is going to be indexed on. The shard is identified using the below equation:
shard = hash(routing_value) % number_of_primary_shards
We can insert the remaining parents using the script here.
Inserting the Children data
Similarly, let’s walk through one child insert before running a bulk insert of the 9 Children depicted on Image 1.
Create Arya Stark
In our example, Arya Stark is a child of Eddard Stark. Notice that we use the same routing
query parameter that we used to create a record for Eddard. This is because of the restriction where both the child and parent documents must be on the same shard.
The join between this record and Eddard’s is made by the relation_type
field, where we add the name
of the relation as a child, making Arya Stark a child of the parent
whose Id is “1” (The same Id we created Eddard with).
We can insert the remaining children using the script here.
Querying Our Data
Now the fun part of executing and understanding, the queries we can run on the relationship we just created.
Searching and Filtering Specific Parents
- Get all children of Lyanna Stark: The
parent_id
query can be used to find child documents which belong to a particular parent.
Executing the above query gets the John Snow document.
{
"took": 2,
..."hits": [{
"_index": "game_of_thrones",
"_type": "_doc",
"_id": "10",
"_routing": "Lyanna",
"_source": {
"name": "John",
"house": "Snow",
"gender": "Male",
"isAlive": true,
"relation_type": {
"name": "child",
"parent": "2"
}
}
}]...
}
- Get All children of Eddard who are alive: The
bool
andmust
query keywords can be used to fetch the records.
Executing the above query will get the records for Arya, Sansa, Bran, and Rickon Stark.
Has Child and Has Parent Queries
The query keywords has_child
and has_parent
help query data with parent-child relationships.
- Get All parents who have daughters who are dead: The
has_child
, keyword helps us fetch all the parent records, where the children have filters.
Executing the above query gets the record of Tywin Lannister, who is the only parent with a dead daughter Cersei.
- Get All Children who's Parent has gender as Female: The
has_parent
, keyword helps us fetch all the child records, where the parents have filters.
Executing the above query gets the record of John Snow, whose parent is Lyanna Stark. All other parents being Male.
Having Multiple Children per Parent
Let us add Catelyn Stark as a wife to Eddard Stark, which is depicted in the below Image 2. Eddard now has Children and Wife documents attached.
The Index can be changed using the code below:
Line 9: We now have an array of relationships associated with the Parent which are “child” and “wife”.
Inserting a “Catelyn Stark” document, is similar to the child record we created earlier, this will use the same routing parameter we used on the parent routing=Eddard
and use “wife” as the relation_type
name.
Query the wife data:
- Get the Lords who have a wife: The query uses the
has_child
keyword and filters by the type of “wife”
Executing the above query gets the record of Eddard Stark.
Multiple Levels of Relationship (Grandchildren)
Let us add Grandchildren to the Starks and Lannisters as depicted in the below Image 3.
The Index needs to be recreated here. This is because of another restriction where it’s is possible to add a child to an existing element only if the element is already a parent. Since “child” type was not a parent when we created the index earlier, we need to drop the earlier index, create a new one with the below code and re-insert all the data.
Line 16: The child, is also made a parent here of the type grandchild. This lets us have the relationship PARENT → CHILD → GRANDCHILD.
Inserting Grandchildren documents is very similar to inserting child records.
In our example, “Ned Jr Something” is a child of Sansa Stark and a grandchild of Eddard Stark. Notice that we use the same routing
query parameter that we used to create a record for Eddard. This is to ensure all the children associated with the super parent, Eddard, are indexed on the same shard.
The join between this record and Sansa’s is made by the relation_type
field, where we add the name
of the relation as a “grandchild” making “Ned Jr” a grandchild of the parent
whose Id is “6” (The same Id we created Sansa with).
We can insert the remaining grand children using the bulk script here.
Querying GrandParent Data
- Get All Grandparents who have grand-daughters:
Executing this query gets us the “Tywin Lannister” record, since he is the only grandparent with a granddaughter Myrcella, as depicted in Image 3.
Using multiple levels of relations to replicate a relational model is not recommended. Each level of relation adds an overhead at query time in terms of memory and computation. You should de-normalize your data if you care about performance. — elastic.co
Restrictions of joins in ElasticSearch
Now that we have seen the join feature in action, let’s go over the restrictions noticed above.
- Parent and child documents must be indexed on the same shard
- Only one
join
field mapping is allowed per index - An element can have multiple children but only one parent
- It is possible to add a new relation to an existing
join
field - It is also possible to add a child to an existing element but only if the element is already a parent
Conclusion
Parent-child joins can be a useful technique for managing relationships when index-time performance is more important than search-time performance, but it comes at a significant cost. One must be aware of the tradeoffs like the physical storage constraint of parent and child document and added complexity. Another precaution is to avoid multi-layered parent-child relationship since this will consume more memory and computation.
Opinions expressed by DZone contributors are their own.
Comments