Introduction to Couchbase for Oracle Developers and Experts: Part 7: Optimizer
In this blog series, we are exploring various topics to compare Oracle and Couchbase from a developer perspective. Today, we are going to talk about optimizer.
Join the DZone community and get the full member experience.
Join For FreeHere are the previous articles comparing architecture, database objects, data types, data modeling, statements and features, and indexing of Oracle with Couchbase. This post will focus on optimizer.
To database what Yoda is to Star Wars, optimizer is. - Yoda
Separating the HOW (logical, physical representations, access methods) from WHAT (the SQL query) was the genius of the relational model. SQL has been unreasonably effective for relational models and beyond. A good optimizer is critical for SQL irrespective of the data model. The job of an optimizer is to produce an efficient execution algorithm, commonly known as the query plan or simply the plan.
Oracle is an RDBMS, that used a rule-based optimizer(RBO) for many years before switching to Selinger style cost-based optimizer. Couchbase N1QL is a declarative query language that is SQL for JSON. Couchbase N1QL also started with a rule-based optimizer. In 2021, we added a cost-based optimizer(CBO) for the query service. CBO for the analytics service is on the roadmap. JSON is flexible, supports complex objects, arrays in documents and predicates.
Couchbase's patented cost-based optimizer builds on and extends the traditional CBO concepts to JSON’s flexible, nested model. Just like Oracle, RBO and CBO coexist in Couchbase. This article describes the query optimizer in the query service of Couchbase. Couchbase analytics service still uses the rule-based optimizer.
*See the full intro to Couchbase for the Oracle developer series here.
Below is a typical flow of query execution:
The job of the optimizer is to transform the query into a query plan:
To achieve that, at a high level, the query optimizer does the following steps:
Rewrite a query to its logical equivalent to make the optimizer easier/better.
E.g., Expression rewrites, subquery flattening.
Join reordering (aka join enumeration).
Determine the most efficient join order.
Choose an efficient access method for each data source and join method.
Choose the right index, predicate pushdowns, join type (e.g., hash join or nested loop join) for each join.
Create a physical plan for the query engine to execute.
Oracle Resources on Optimizer
SQL Maria has great byte-sized videos explaining the Oracle Optimizer. Much of the Oracle features and descriptions in this article are from them. Oracle’s SQL tuning guide is also a great resource for learning about the Oracle optimizer. Optimizer is the key to the success of SQL and the performance of your queries.
Let’s deep dive into each of the optimizers and compare the features.
RESOURCES
ORACLE |
Couchbase |
CBO Blog and Stats blog |
|
INLIST handling |
IN list handling. We’ve seen queries with 25,000 values in the IN list! N1QL optimizer handles various forms of IN list efficiently by using multiple index lookups on both static and dynamic lists by creating a hash map that has O(1) lookup complexity during IN clause evaluation. This was added in Couchbase 6.5. |
Subqueries: Significant subquery optimizations via query rewrites. See chapter 5.4 |
Subqueries. There are simple optimizations for both correlated and non-correlated subqueries. Subquery unnesting (flattening) or rewriting subqueries as joins aren’t done yet. |
Materialized view rewrites. Used mainly in data warehouse workload. |
Couchbase does have materialized views (simply called views) that execute map-reduce functions. These views have a separate API to access and are not used by N1QL. |
Star join transformations and others. |
Query service is targeted for OLTP-like workload. Many of the transformations like star transformations are useful for data warehousing workload. |
Nested loop |
Nested loop |
Hash join |
Hash join. With RBO, you need to specify the build or the probe side of the has joined for one of the collections; CBO determines probe and builds sides based on the statistics. The query service does limit the number of items for the hash table to 16 million to avoid memory bloat. |
Sort-merge join |
Unavailable |
Join reordering is an important step in optimization; Join order can change a query execution time from hours to minutes and vice versa. It’s based on the cardinality estimates to create a plan with the lowest cost. |
Couchbase 7.1 adds cost-based join-reordering optimization (aka, join enumeration). Prior to 7.1, the join order is the same as the order specified in the FROM clause of the SELECT statement. |
Full table scan; Reads all the rows in the table without any logical order. Rarely done in an OLTP application and use only where index paths are unavailable. |
Couchbase collection is a hash distributed table and doesn’t support the exact full table (collection) scan. However, you can create a primary index that provides equivalent functionality. In addition, you can filter the data based on the document key and you can also exploit the key ordering in the index E.g. FROM t ORDER BY META(t).id The primary index can also be used to efficiently filter and paginate the result sets. FROM t WHERE META(t).id LIKE “CA::%” ORDER BY META(t).id OFFSET 100 LIMIT 10; Pro tip: Consider using keyset pagination. |
Table access by rowid |
Document key is the equivalent of rowid. Index scans return a set of document keys to query service which uses a look-up by rowid to retrieve the document for further processing. The applications can also retrieve documents by specifying one or more keys SELECT * FROM t USE KEYS [“k1”] SELECT * FROM t USE KEYS [“k1”, “k2”, “K8”, “K4”] |
Index unique scan |
Index scan with limit 1 |
Index range scan |
Index range scan is the main workhorse for the query service to use the indexer. The query specifies the spans for the index to filter and return the results. This range scan can also skip the keys in the case of a composite index. E.g. index on (c1, c2, c3) can be used with index spans on c1 and c3 for WHERE c1 = 20 and c3 > 40; |
Index skip scan The optimizer does a statistical analysis of the leading columns to scan and filter values in non-leading columns. This can reduce the number of indexes you need in the system. |
Unavailable. |
Full Index Scan |
Full index scans are used depending on the predicate and cost. E.g. WHERE c1 IS NOT MISSING ORDER BY c1; |
Fast Full Index Scan [All the columns are in the index and used for order by] |
This is simply an ordinary index scan with an optimizer matching the order by expressions with index key order; Additional optimization by pushing down the pagination clauses (OFFSET, LIMIT) to the index scan |
Index join |
In Couchbase parlance, it’s called a multi-index scan. Multiple indexes are used to evaluate a single collection, results of which can be joined with other indexes. |
Bitmap indexes |
Not available |
Array indexes are known as multivalue indexes in Oracle. They need to have a special code remove duplicates from the qualified rowids returned from the index scan. |
An array index scan is similar to Couchbase. For arrays, there could be multiple index entries for a single document. Hence duplicate document keys returned by the index scan need to be removed. Couchbase array indexes can themselves have an array constructor, complex expressions, and therefore is more flexible. |
Statistics Collection |
|
ANALYZE statement |
ANALYZE statement Same as UPDATE STATISTICS statement |
Automatic statistics collection via DBMS package. |
Unavailable |
Multiple ways to generate the query plan. The plan is tabular with indentation to represent the order.
See the full guide here. |
A single way to generate the query plan. The plan generated is in JSON and therefore the query execution tree is correctly represented without having to guess. |
Graphical plan with profile: I searched and searched. I could only see an old graphical explain; Nothing for 21c |
Graphical explain plan via query work bench. |
Graphical plan with profile: I searched and searched. I could only see an old graphical explain; Nothing for 21c |
When you execute the query, the profile will give the same plan with different shades of color to give you a sense of the expense. Clicking on each box will give you stats on that iterator. |
Hints: Extensive hints are available. |
Hints: Extensive hints are available. N1QL hints are similar to the other Oracle database, MySQL than Oracle database. |
Opinions expressed by DZone contributors are their own.
Comments