NEST and UNNEST: Normalizing and Denormalizing JSON on the Fly

This article details a variety of nesting and unnesting procedures you can implement in Couchbase's N1QL to organize and query your data.

Johan Larson

Oct. 06, 16 · Tutorial

Likes (14)

Comment

Save

33.3K Views

When modelling data in a JSON database like Couchbase, developers and architects have two options for representing hierarchical data. The first option is to embed the related objects in the parent object, like this:

"1" : {"name":"doc 1","subdocs":[{"name":"doc 1.1"},{"name":"doc 1.2"},{"name":"doc 1.3"}]}
"2" : {"name":"doc 2","subdocs":[{"name":"doc 2.1"},{"name":"doc 2.2"}]}

The alternative is to represent the related objects separately, and refer to them by ID, like this:

"1" : {"name":"doc 1","subdocs":["1.1","1.2","1.3"]}
"2" : {"name":"doc 2","subdocs":["2.1","2.2"]}
"1.1" : {"name":"doc 1.1"}
"1.2" : {"name":"doc 1.2"}
"1.3" : {"name":"doc 1.3"}
"2.1" : {"name":"doc 2.1"}
"2.1" : {"name":"doc 2.2"}

Exactly when to use one form or the other is beyond the scope of this article. But whichever is chosen, the N1QL language lets developers convert to the other on the fly. These conversions are done with two operators, NEST and UNNEST.

“UNNEST”-ing Nested Structures

Suppose we have an order-tracking application and we have designed our database object model to have the line items stored within the order document itself (nested), like this:

"1" : {
  "order_id": "1",
  "type": "order",
  "customer_id": "24601",
  "total_price": 30.3,
  "lineitems": [
    { "item_id": 576, "quantity": 3, "item_price": 4.99, "base_price": 14.97, "tax": 0.75, 
        "final_price": 15.72 },
    { "item_id": 234, "quantity": 1, "item_price": 12.95, "base_price": 12.95, "tax": 0.65, 
        "final_price": 13.6 },
    { "item_id": 122, "quantity": 2, "item_price": 0.49, "base_price": 0.98, "final_price": 0.98 }
  ]
}

"5" : {
  "order_id": "5",
  "type": "order",
  "customer_id": "98732",
  "total_price": 428.04,
  "lineitems": [
    { "item_id": 770, "quantity": 3, "item_price": 95.97, "base_price": 287.91, "tax": 14.4, 
        "final_price": 302.31 },
    { "item_id": 712, "quantity": 1, "item_price": 125.73, "base_price": 125.73, 
        "final_price": 125.73 }
  ]
}

Now we want to compute the tax payable for each order. How would we do it? The tax information is not available at the order level; we only know the tax for each line item.

We have to do two things. First, we have to extract the line items from the orders, and then we have to aggregate the taxes by order. The extraction will be done with an UNNEST operation and the aggregation with an aggregation function and a GROUP BY.

We can extract the line items with this query:

 SELECT * FROM demo UNNEST lineitems

That produces this result:

[
  {
    "demo": {
      "customer_id": "24601",
      "lineitems": [
        { "base_price": 14.97, "final_price": 15.72, "item_id": 576, "item_price": 4.99, "quantity": 3, 
            "tax": 0.75 },
        { "base_price": 12.95, "final_price": 13.6, "item_id": 234, "item_price": 12.95, "quantity": 1, 
            "tax": 0.65 },
        { "base_price": 0.98, "final_price": 0.98, "item_id": 122, "item_price": 0.49, "quantity": 2 }
      ],
      "order_id": "1",
      "total_price": 30.3,
      "type": "order"
    },
    "lineitems": { "base_price": 14.97, "final_price": 15.72, "item_id": 576, "item_price": 4.99, 
            "quantity": 3, "tax": 0.75 }
  },
  {
    "demo": {
      "customer_id": "24601",
      "lineitems": [
        { "base_price": 14.97, "final_price": 15.72, "item_id": 576, "item_price": 4.99, "quantity": 3, 
            "tax": 0.75
        },
        { "base_price": 12.95, "final_price": 13.6, "item_id": 234, "item_price": 12.95, "quantity": 1, 
            "tax": 0.65
        },
        { "base_price": 0.98, "final_price": 0.98, "item_id": 122, "item_price": 0.49, "quantity": 2 }
      ],
      "order_id": "1",
      "total_price": 30.3,
      "type": "order"
    },
    "lineitems": { "base_price": 12.95, "final_price": 13.6, "item_id": 234, "item_price": 12.95, 
        "quantity": 1, "tax": 0.65 }
  },
  {
    "demo": {
      "customer_id": "24601",
      "lineitems": [
        { "base_price": 14.97, "final_price": 15.72, "item_id": 576, "item_price": 4.99, "quantity": 3, 
            "tax": 0.75 },
        { "base_price": 12.95, "final_price": 13.6, "item_id": 234, "item_price": 12.95, "quantity": 1, 
            "tax": 0.65
        },
        { "base_price": 0.98, "final_price": 0.98, "item_id": 122, "item_price": 0.49, "quantity": 2 }
      ],
      "order_id": "1",
      "total_price": 30.3,
      "type": "order"
    },
    "lineitems": { "base_price": 0.98, "final_price": 0.98, "item_id": 122, "item_price": 0.49, 
        "quantity": 2 }
  },
  {
    "demo": {
      "customer_id": "98732",
      "lineitems": [
        { "base_price": 287.91, "final_price": 302.31, "item_id": 770, "item_price": 95.97, 
            "quantity": 3, "tax": 14.4 },
        { "base_price": 125.73, "final_price": 125.73, "item_id": 712, "item_price": 125.73, 
            "quantity": 1 }
      ],
      "order_id": "5",
      "total_price": 428.04,
      "type": "order"
    },
    "lineitems": { "base_price": 287.91, "final_price": 302.31, "item_id": 770, "item_price": 95.97, 
        "quantity": 3, "tax": 14.4 }
  },
  {
    "demo": {
      "customer_id": "98732",
      "lineitems": [
        { "base_price": 287.91, "final_price": 302.31, "item_id": 770, "item_price": 95.97, 
            "quantity": 3, "tax": 14.4 },
        { "base_price": 125.73, "final_price": 125.73, "item_id": 712, "item_price": 125.73, 
            "quantity": 1 }
      ],
      "order_id": "5",
      "total_price": 428.04,
      "type": "order"
    },
    "lineitems": { "base_price": 125.73, "final_price": 125.73, "item_id": 712, "item_price": 125.73, 
        "quantity": 1 }
  }
]

That’s a big result. Let’s look more closely. The result has five objects, each with two fields: “demo” contains the entire original (parent) object, and “lineitems” contains one of the elements of the “lineitems” array in the original object. So we’re extracted each line item, and we’ve kept around each original (parent) object, in case we need it.

We can then group by demo.order_id and sum up lineitems.tax, with this query:

SELECT demo.order_id, SUM(lineitems.tax) AS total_tax 
FROM demo UNNEST lineitems 
GROUP BY demo.order_id

Which produces this trim result:

[
  {
    "order_id": "1",
    "total_tax": 1.4
  },
  {
    "order_id": "5",
    "total_tax": 14.4
  }
]

Using Indexes for UNNEST

Let’s think about how the basic query is executed.

EXPLAIN SELECT * FROM demo UNNEST lineitems

The plan produced is this:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "PrimaryScan",
          "index": "#primary",
          "keyspace": "demo",
          "namespace": "default",
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "keyspace": "demo",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Unnest",
                "as": "lineitems",
                "expr": "(`demo`.`lineitems`)"
              }
            ]
          }
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT * FROM demo UNNEST lineitems"
  }
]

This plan is going to scan the primary index for the demo bucket, fetch every record it references, and feed it into a scan operator.

If we want to be more selective about the orders, that’s easy enough with an index. We add a predicate to the query,

SELECT * FROM demo UNNEST lineitems WHERE demo.customer_id = "999"

We also add an index on the new field,

CREATE INDEX demo_customer_id ON demo(customer_id)

And now the plan contains a shiny new IndexScan operator that uses the new index.

        {
          "#operator": "IndexScan",
          "index": "demo_customer_id",
          "index_id": "30061046820ccba9",
          "keyspace": "demo",
          "namespace": "default",
          "spans": [
            {
              "Exact": true,
              "Range": {
                "High": [
                  "\"999\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"999\""
                ]
              }
            }
          ],
          "using": "gsi"
       }

But what if we want to be selective about the line items? Suppose we only want line items with item_id = 567, like this:

SELECT * FROM demo UNNEST lineitems WHERE lineitems.item_id = 576

For that, we’ll need an array index, like this

CREATE INDEX demo_lineitem_item_id ON 
demo(DISTINCT ARRAY l.item_id FOR l IN lineitems END)

This is an index on the item_id fields of the lineitems array in each order. Then we try an EXPLAIN, and nothing has happened. The index doesn’t get selected.

The problem is the original query. The current version of Couchbase is very particular about how it uses array indexes, and the UNNEST clause has to be in a particular format to match the index. In particular, look at the “l.item_id” in the index definition. That has to exactly match the field name and prefix in the query. Things work correctly if we change the query like this:

SELECT * FROM demo UNNEST lineitems l WHERE l.item_id = 576

Then the EXPLAIN shows us a different operator, indicating the index is in use:

       {
          "#operator": "DistinctScan",
          "scan": {
            "#operator": "IndexScan",
            "index": "demo_lineitem_item_id",
            "index_id": "f6b19179f6dc0496",
            "keyspace": "demo",
            "namespace": "default",
            "spans": [
              {
                "Exact": true,
                "Range": {
                  "High": [
                    "576"
                  ],
                  "Inclusion": 3,
                  "Low": [
                    "576"
                  ]
                }
              }
            ],
            "using": "gsi"
          }
       }

More information about array indexes is available here:

Left Outer Unnest

Now, suppose we add a third document to the demo bucket, like this:

INSERT INTO demo VALUES ("6", {
  "customer_id": "77077",
  "order_id": "6",
  "total_price": 0,
  "type": "order"
})

If we then rerun the query...

SELECT * FROM demo UNNEST lineitems

...we see a result, but the new order is not included. That’s because the new order has no line items, and therefore does not participate in the join.

We can include the new order in the result by switching to a LEFT OUTER UNNEST, which includes documents even when they have no sub-objects.

The query...

SELECT * FROM demo LEFT OUTER UNNEST lineitems

...yields the same results as the earlier one, but this time with the new order included:

  {
    "demo": {
      "customer_id": "77077",
      "order_id": "6",
      "total_price": 0,
      "type": "order"
    }
  }

JOIN With UNNEST

Now let us suppose we have decomposed the orders into main and subordinate objects, perhaps because the principal object had grown too large,and was generating too much network traffic, like this:

"1" : {
  "order_id": "1",
  "type": "order",
  "customer_id": "24601",
  "sub" : "1A"
}

"1A" : {
  "type" : "order_sub",
  "order_sub_id" : "1A", 
  "total_price": 30.3,
  "lineitems": [
    { "item_id": 576, "quantity": 3, "item_price": 4.99, "base_price": 14.97, "tax": 0.75, 
        "final_price": 15.72 },
    { "item_id": 234, "quantity": 1, "item_price": 12.95, "base_price": 12.95, "tax": 0.65, 
        "final_price": 13.6 },
    { "item_id": 122, "quantity": 2, "item_price": 0.49, "base_price": 0.98, "final_price": 0.98 }
  ]
}

"5" : {
  "order_id": "5",
  "type": "order",
  "customer_id": "98732",
  "sub" : "5A", 
}

"5A" : {
  "type" : "order_sub", 
  "order_sub_id" : "5A", 
  "total_price": 428.04,
  "lineitems": [
    { "item_id": 770, "quantity": 3, "item_price": 95.97, "base_price": 287.91, "tax": 14.4, 
        "final_price": 302.31 },
    { "item_id": 712, "quantity": 1, "item_price": 125.73, "base_price": 125.73, 
        "final_price": 125.73 }
  ]
}

How does that force us to modify our query? We can join up the objects and subordinate objects like this:

SELECT * FROM demo ordr JOIN demo subdata ON KEYS ordr.sub

That yields this result:

[
  {
    "ordr": {
      "customer_id": "24601",
      "order_id": "1",
      "sub": "1A",
      "type": "order"
    },
    "subdata": {
      "lineitems": [
        {
          "base_price": 14.97,
          "final_price": 15.72,
          "item_id": 576,
          "item_price": 4.99,
          "quantity": 3,
          "tax": 0.75
        },
        {
          "base_price": 12.95,
          "final_price": 13.6,
          "item_id": 234,
          "item_price": 12.95,
          "quantity": 1,
          "tax": 0.65
        },
        {
          "base_price": 0.98,
          "final_price": 0.98,
          "item_id": 122,
          "item_price": 0.49,
          "quantity": 2
        }
      ],
      "order_sub_id": "1A",
      "total_price": 30.3,
      "type": "order_sub"
    }
  },
  {
    "ordr": {
      "customer_id": "98732",
      "order_id": "5",
      "sub": "5A",
      "type": "order"
    },
    "subdata": {
      "lineitems": [
        {
          "base_price": 287.91,
          "final_price": 302.31,
          "item_id": 770,
          "item_price": 95.97,
          "quantity": 3,
          "tax": 14.4
        },
        {
          "base_price": 125.73,
          "final_price": 125.73,
          "item_id": 712,
          "item_price": 125.73,
          "quantity": 1
        }
      ],
      "order_sub_id": "5A",
      "total_price": 428.04,
      "type": "order_sub"
    }
  }
]

We can then unnest on subdata.lineitems to split out the individual lineitems:

SELECT * FROM demo ordr JOIN demo subdata ON KEYS ordr.sub
UNNEST subdata.lineitems

That produces five results (one for each line item), structured like this:

 {
    "lineitems": {
      "base_price": 14.97,
      "final_price": 15.72,
      "item_id": 576,
      "item_price": 4.99,
      "quantity": 3,
      "tax": 0.75
    },
    "ordr": {
      "customer_id": "24601",
      "order_id": "1",
      "sub": "1A",
      "type": "order"
    },
    "subdata": {
      "lineitems": [
        {
          "base_price": 14.97,
          "final_price": 15.72,
          "item_id": 576,
          "item_price": 4.99,
          "quantity": 3,
          "tax": 0.75
        },
        {
          "base_price": 12.95,
          "final_price": 13.6,
          "item_id": 234,
          "item_price": 12.95,
          "quantity": 1,
          "tax": 0.65
        },
        {
          "base_price": 0.98,
          "final_price": 0.98,
          "item_id": 122,
          "item_price": 0.49,
          "quantity": 2
        }
      ],
      "order_sub_id": "1A",
      "total_price": 30.3,
      "type": "order_sub"
    }
  }

From there, we can aggregate as before.

“NEST”-ing an Unnested Structure

Now let’s flip the problem. What if the data starts out with orders and line items as separate entries in the database (unnested), and we want to group them together with line items under the documents. For example, we may want for each order, the items included and the quantity of each item.

In this case, the data might be represented like this, using seven separate objects:

"1" : { "order_id": "1", "type": "order", "customer_id": "24601", "total_price": 30.3, 
        "lineitems": [ "11", "12", "13" ] }
"11" : { "lineitem_id": "11", "type": "lineitem", "item_id": 576, "quantity": 3, "item_price": 4.99, 
        "base_price": 14.97, "tax": 0.75, "final_price": 15.72 }
"12" : { "lineitem_id": "12", "type": "lineitem", "item_id": 234, "quantity": 1, "item_price": 12.95, 
        "base_price": 12.95, "tax": 0.65, "final_price": 13.6 }
"13" : { "lineitem_id": "13", "type": "lineitem", "item_id": 122, "quantity": 2, "item_price": 0.49, 
        "base_price": 0.98, "final_price": 0.98 }
"5" : { "order_id": "5", "type": "order", "customer_id": "98732", "total_price": 428.04,
        "lineitems" : [ "51", "52" ] }
"51" : { "lineitem_id": "51", "type": "lineitem", "item_id": 770, "quantity": 2, "item_price": 95.97, 
        "base_price": 287.91, "tax": 14.4, "final_price": 302.31 }
"52" : { "lineitem_id": "52", "type": "lineitem", "item_id": 712, "quantity": 1, "item_price": 125.73, 
        "base_price": 125.73, "final_price": 125.73 }

From these documents, we can use the NEST operator to transform the individual objects into nested structures using this query:

SELECT * FROM demo ordr NEST demo li ON KEYS ordr.lineitems

The query yields this result:

[
  {
    "li": [
      { "base_price": 14.97, "final_price": 15.72, "item_id": 576, "item_price": 4.99, 
        "lineitem_id": "11", "quantity": 3, "tax": 0.75, "type": "lineitem" },
      { "base_price": 0.98, "final_price": 0.98, "item_id": 122, "item_price": 0.49, "lineitem_id": "13", 
        "quantity": 2, "type": "lineitem" },
      { "base_price": 12.95, "final_price": 13.6, "item_id": 234, "item_price": 12.95, 
        "lineitem_id": "12", "quantity": 1, "tax": 0.65, "type": "lineitem" }
    ],
    "ordr": {
      "customer_id": "24601",
      "lineitems": [
        "11",
        "12",
        "13"
      ],
      "order_id": "1",
      "total_price": 30.3,
      "type": "order"
    }
  },
  {
    "li": [
      { "base_price": 287.91, "final_price": 302.31, "item_id": 770, "item_price": 95.97, 
        "lineitem_id": "51", "quantity": 2, "tax": 14.4, "type": "lineitem" },
      { "base_price": 125.73, "final_price": 125.73, "item_id": 712,"item_price": 125.73, 
        "lineitem_id": "52", "quantity": 1, "type": "lineitem" }
    ],
    "ordr": {
      "customer_id": "98732",
      "lineitems": [
        "51",
        "52"
      ],
      "order_id": "5",
      "total_price": 428.04,
      "type": "order"
    }
  }
]

Again, this is quite a big result. But let’s look more closely. There are two objects in the result, one for each order. Each object has two fields. The order fields are under “ordr”, and the line items are in an array under “li”.

But we can simplify further. We just need the “order_id” of each order, and we only need the “item_id” and "quantity" for each line item. We can get the “order_id” from ordr.order_id, and we can extract the “item_id” and "quantity" from the “li” array using an array comprehension, like this:

SELECT ordr.order_id, ARRAY {"item_id": l.item_id, "quantity" : l.quantity} FOR l IN li END as items 
FROM demo ordr NEST demo li ON KEYS ordr.lineitems

The query produces this trim result:

[
  {
    "items": [
      { "item_id": 576, "quantity": 3 },
      { "item_id": 234, "quantity": 1 },
      { "item_id": 122, "quantity": 2 }
    ],
    "order_id": "1"
  },
  {
    "items": [
      { "item_id": 712, "quantity": 1 },
      { "item_id": 770, "quantity": 2 }
    ],
    "order_id": "5"
  }
]

Using Indexes for Nest Operations

Let’s return to the original NEST query and check Couchbase will execute it.

EXPLAIN SELECT * FROM demo ordr NEST demo li ON KEYS ordr.lineitems

That gets us this plan:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "PrimaryScan",
          "index": "#primary",
          "keyspace": "demo",
          "namespace": "default",
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "as": "ordr",
          "keyspace": "demo",
          "namespace": "default"
        },
        {
          "#operator": "Nest",
          "as": "li",
          "keyspace": "demo",
          "namespace": "default",
          "on_keys": "(`ordr`.`lineitems`)"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT * FROM demo ordr NEST demo li ON KEYS ordr.lineitems"
  }
]

No great mystery here. Couchbase is going to scan the primary index on the demo bucket, and probe the primary index to get each lineitem.

If there is a condition on the primary object that can be served by an index, Couchbase will use it. You can see the difference by adding a predicate and an index like this:

CREATE INDEX demo_cust on demo(customer_id)

EXPLAIN SELECT * FROM demo ordr 
NEST demo li ON KEYS ordr.lineitems WHERE ordr.customer_id = 334

Now the plan contains an IndexScan operator, which shows that Couchbase will use the new index:

        {
          "#operator": "IndexScan",
          "index": "demo_cust",
          "index_id": "74769ea5090a37b7",
          "keyspace": "demo",
          "namespace": "default",
          "spans": [
            {
              "Exact": true,
              "Range": {
                "High": [
                  "334"
                ],
                "Inclusion": 3,
                "Low": [
                  "334"
                ]
              }
            }
          ],
          "using": "gsi"
       }

What about conditions on the line items? Suppose we have a predicate item_id=555?

Here, it turns out indexes don’t help. Predicates on the right-hand side of NEST operators are applied after the NEST operation, period. And this may have consequences for the data design. Any fields that are needed for selectivity when using NEST operations should be placed in the principal object, not pushed down to the secondary objects.

Left Outer Nest

In the earlier section, we used nest to group external objects referenced by keys into the top-level objects they belonged to. But what about objects that don’t have subobjects?

We can test this by adding a third object to the data set, like this:

INSERT INTO demo VALUES ("6", {
  "customer_id": "77077",
  "order_id": "6",
  "total_price": 0,
  "type": "order"
})

Then we rerun the query, and get exactly the same result as before. Order “6” isn’t present. Why? The problem is that order “6” doesn’t have a lineitems array, so it isn’t included in the join.

How could we add it to the result? First of all we need to switch to a LEFT OUTER NEST operator in the query. But that isn’t quite enough. That change alone will also include documents in the demo bucket that are lineitems. We need want only documents that are orders, even if they don’t have lineitems. That gets us this final query...

SELECT * FROM demo ordr LEFT OUTER NEST demo li ON KEYS ordr.lineitems 
WHERE ordr.type = "order"

...producing this result, including order “6”:

[
  {
    "li": [
      { "base_price": 14.97, "final_price": 15.72, "item_id": 576, "item_price": 4.99, 
        "lineitem_id": "11", "quantity": 3, "tax": 0.75, "type": "lineitem" },
      { "base_price": 12.95, "final_price": 13.6, "item_id": 234, "item_price": 12.95,
        "lineitem_id": "12", "quantity": 1, "tax": 0.65, "type": "lineitem" },
      { "base_price": 0.98, "final_price": 0.98, "item_id": 122, "item_price": 0.49,
        "lineitem_id": "13", "quantity": 2, "type": "lineitem" }
    ],
    "ordr": {
      "customer_id": "24601",
      "lineitems": [
        "11",
        "12",
        "13"
      ],
      "order_id": "1",
      "total_price": 30.3,
      "type": "order"
    }
  },
  {
    "li": [
      { "base_price": 287.91, "final_price": 302.31, "item_id": 770, "item_price": 95.97,
        "lineitem_id": "51", "quantity": 2, "tax": 14.4, "type": "lineitem" },
      { "base_price": 125.73, "final_price": 125.73, "item_id": 712, "item_price": 125.73,
        "lineitem_id": "52", "quantity": 1, "type": "lineitem” }
    ],
    "ordr": {
      "customer_id": "98732",
      "lineitems": [
        "51",
        "52"
      ],
      "order_id": "5",
      "total_price": 428.04,
      "type": "order"
    }
  },
  {
    "ordr": {
      "customer_id": "77077",
      "order_id": "6",
      "total_price": 0,
      "type": "order"
    }
  }
]

JOIN With NEST

Sometimes objects grow too big to keep in a single document, and it makes sense to split the original single object into sub-objects. For example, here we have split out the original line item and price fields from the original objects. Notice the “1A” and “5A” documents.

"1" : { "order_id": "1", "type": "order", "customer_id": "24601", "sub" : "1A" } 
"1A" : { "total_price": 30.3, "lineitems": [ "11", "12", "13" ], "type" : "order_sub", 
        "order_sub_id" : "1A" }
"11" : { "lineitem_id": "11", "type": "lineitem", "item_id": 576, "quantity": 3, "item_price": 4.99, 
        "base_price": 14.97, "tax": 0.75, "final_price": 15.72 }
"12" : { "lineitem_id": "12", "type": "lineitem", "item_id": 234, "quantity": 1, "item_price": 12.95, 
        "base_price": 12.95, "tax": 0.65, "final_price": 13.6 }
"13" : { "lineitem_id": "13", "type": "lineitem", "item_id": 122, "quantity": 2, "item_price": 0.49, 
        "base_price": 0.98, "final_price": 0.98 }
"5" : { "order_id": "5", "type": "order", "customer_id": "98732", "sub" : "5A" } 
"5A" : { "total_price": 428.04,
        "lineitems" : [ "51", "52" ], "type" : "order_sub", "order_sub_id" : "5A" }
"51" : { "lineitem_id": "51", "type": "lineitem", "item_id": 770, "quantity": 2, "item_price": 95.97, 
        "base_price": 287.91, "tax": 14.4, "final_price": 302.31 }
"52" : { "lineitem_id": "52", "type": "lineitem", "item_id": 712, "quantity": 1, "item_price": 125.73, 
        "base_price": 125.73, "final_price": 125.73 }

How then can we reform the original object?

We begin by joining on the “sub” field to recreate the original object...

SELECT * FROM demo ordr JOIN demo subdata ON KEYS ords.sub

...yielding this:

[
  {
    "ordr": {
      "customer_id": "24601",
      "order_id": "1",
      "sub": "1A",
      "type": "order"
    },
    "subdata": {
      "lineitems": [
        "11",
        "12",
        "13"
      ],
      "order_sub_id": "1A",
      "total_price": 30.3,
      "type": "order_sub"
    }
  },
  {
    "ordr": {
      "customer_id": "98732",
      "order_id": "5",
      "sub": "5A",
      "type": "order"
    },
    "subdata": {
      "lineitems": [
        "51",
        "52"
      ],
      "order_sub_id": "5A",
      "total_price": 428.04,
      "type": "order_sub"
    }
  }
]

We can then add a NEST clause to join in the line items, too.

SELECT * FROM demo ordr JOIN demo subdata ON KEYS ordr.sub 
NEST demo li ON KEYS subdata.lineitems

That yields this result:

[
  {
    "li": [
      { "base_price": 14.97, "final_price": 15.72, "item_id": 576, "item_price": 4.99,
        "lineitem_id": "11", "quantity": 3, "tax": 0.75, "type": "lineitem" },
      { "base_price": 12.95, "final_price": 13.6, "item_id": 234, "item_price": 12.95, 
        "lineitem_id": "12", "quantity": 1, "tax": 0.65, "type": "lineitem" },
      { "base_price": 0.98, "final_price": 0.98, "item_id": 122, "item_price": 0.49,
        "lineitem_id": "13", "quantity": 2, "type": "lineitem"
      }
    ],
    "ordr": {
      "customer_id": "24601",
      "order_id": "1",
      "sub": "1A",
      "type": "order"
    },
    "subdata": {
      "lineitems": [
        "11",
        "12",
        "13"
      ],
      "order_sub_id": "1A",
      "total_price": 30.3,
      "type": "order_sub"
    }
  },
  {
    "li": [ 
      { "base_price": 287.91, "final_price": 302.31, "item_id": 770, "item_price": 95.97, 
        "lineitem_id": "51", "quantity": 2, "tax": 14.4, "type": "lineitem" },
      { "base_price": 125.73, "final_price": 125.73, "item_id": 712, "item_price": 125.73,
        "lineitem_id": "52", "quantity": 1, "type": "lineitem" }
    ],
    "ordr": {
      "customer_id": "98732",
      "order_id": "5",
      "sub": "5A",
      "type": "order"
    },
    "subdata": {
      "lineitems": [
        "51",
        "52"
      ],
      "order_sub_id": "5A",
      "total_price": 428.04,
      "type": "order_sub"
    }
  }
]

Summary

When working with a document database, we may need to pull nested objects from their top-level documents. In Couchbase N1QL, this is done with the UNNEST operator.

Conversely, we may need to group individual objects under their top-level documents. In Couchbase N1QL, this is done with the NEST operator.

Try it Yourself

You can create the dataset used in the UNNEST section by going to the Data Buckets tab of your Couchbase admin console and creating a “demo” bucket. Then go to the Query tab, and execute these two statements:

CREATE PRIMARY INDEX ON demo

INSERT INTO demo VALUES ("1",{
  "order_id": "1",
  "type": "order",
  "customer_id": "24601",
  "total_price": 30.3,
  "lineitems": [
    { "item_id": 576, "quantity": 3, "item_price": 4.99, "base_price": 14.97, "tax": 0.75, 
        "final_price": 15.72 },
    { "item_id": 234, "quantity": 1, "item_price": 12.95, "base_price": 12.95, "tax": 0.65, 
        "final_price": 13.6 },
    { "item_id": 122, "quantity": 2, "item_price": 0.49, "base_price": 0.98, "final_price": 0.98 }
  ]
}), 
("5",{
  "order_id": "5",
  "type": "order",
  "customer_id": "98732",
  "total_price": 428.04,
  "lineitems": [
    { "item_id": 770, "quantity": 3, "item_price": 95.97, "base_price": 287.91, "tax": 14.4, 
        "final_price": 302.31 },
    { "item_id": 712, "quantity": 1, "item_price": 125.73, "base_price": 125.73, 
        "final_price": 125.73 }
  ]
})

You can then run the query:

SELECT demo.order_id, SUM(lineitems.tax) as total_tax FROM demo UNNEST lineitems 
GROUP BY demo.order_id

Then empty the demo bucket:

DELETE FROM demo

Set up the data for the NEST section:

INSERT INTO demo VALUES 
("1",{ "order_id": "1", "type": "order", "customer_id": "24601", 
        "total_price": 30.3, "lineitems": [ "11", "12", "13" ]}), 
("11",{ "lineitem_id": "11", "type": "lineitem", "item_id": 576, "quantity": 3, "item_price": 4.99, 
        "base_price": 14.97, "tax": 0.75, "final_price": 15.72 }), 
("12",{ "item_id": 234, "type": "lineitem", "lineitem_id": "12", "quantity": 1, "item_price": 12.95, 
        "base_price": 12.95, "tax": 0.65, "final_price": 13.6 }), 
("13",{ "lineitem_id": "13", "type": "lineitem", "item_id": 122, "quantity": 2, "item_price": 0.49, 
        "base_price": 0.98, "final_price": 0.98 }), 
("5",{ "order_id": "5", "type": "order", "customer_id": "98732", "total_price": 428.04, 
        "lineitems" : [ "51", "52" ] }), 
("51",{ "lineitem_id": "51", "type": "lineitem", "item_id": 770, "quantity": 2, "item_price": 95.97,
        "base_price": 287.91, "tax": 14.4, "final_price": 302.31 }), 
("52",{ "lineitem_id": "52", "type": "lineitem", "item_id": 712, "quantity": 1, "item_price": 125.73, 
        "base_price": 125.73, "final_price": 125.73 })

And run the query:

SELECT ordr.order_id, ARRAY {"item_id": l.item_id, "quantity" : l.quantity} 
FOR l IN li END as items 
FROM demo ordr NEST demo li ON KEYS ordr.lineitems

Database NEST (software) JSON Object (computer science) Document Operator (extension) Data structure Data (computing)

Opinions expressed by DZone contributors are their own.

Related

Trending