Introduction to Couchbase for Oracle Developers and Experts: Part 6: Indexing

Today, we're exploring data and data processing issues and interests. SQL, NoSQL, flexible schema, scale-up, scale-out, transactions, and high availability.

Keshav Murthy

CORE ·

Feb. 06, 22 · Tutorial

Likes (3)

Comment

Save

3.7K Views

Here are the previous articles comparing architecture, database objects, data types, data modeling, and statements and features of Oracle with Couchbase. This post will focus on indexing.

"Use the Index, Luke!" -- Source

Overview

Oracle

Couchbase

Index Documentation

Index Documentation

Types of Indexes:
Primary & secondary Index (B-tree)
Bitmap Index
Partial Index
Partitioned Index
Function-based index
Spatial index
Search indexes (full-text search)

Types of Indexes:
Primary & secondary index (lock-free skiplist)
Partial index
Partitioned index
Functional-key index
Array index
Flex indexes
Search index
Spatial index

Indexing data structures
B-Tree
Bitmap
Spatial
Inverted tree

Indexing data structures
Lock-free skiplist
Inverted tree (text)
Z-curve (spatial)

CLASSES of INDEXES

Oracle indexes can be large and are managed through bufferpools. Oracle text index can be creed with an in-memory option.

Couchbase has two classes of indexes: Standard secondary index which can be large and relevant entries are paged in based on usage, memory-optimized index, optimized for performance is entirely kept in memory and a write is done to the disk for recovery purposes.

INDEXING FEATURES

Updates;
Indexes are updated synchronously. Changes are visible within the transaction (read your own writes).

Updates:
The indexes are updated asynchronously. However, within each transaction, changes are visible immediately (read your own writes) for all the access methods (including index scan). Yes, we do magic!

Table Scans
For SQL indexes are optional. You need it for improving query latency, throughput, and meeting SLAs. Tables have internal mechanisms to scan the entire table, partitioned or otherwise.

Collection Scans
Couchbase collections are hash partitioned distributed table/collection. You can retrieve a document if you have the document key. There isn’t a full scan access method just in the collection. You should build a PRIMARY INDEX that provides the equivalent of a table scan for collection. You can issue arbitrary queries on it.

Index consistency
Index updates are synchronous and are visible based on the isolation level.

Index consistency:
Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes are always in place. These subtle features are taken for granted in a single system and transactional RDBMS. These options are available in modern distributed systems so applications can trade off consistency for availability and performance.

TYPES OF INDEXES

PRIMARY KEY index
CREATE TABLE t1(c1 int primary key)
CREATE TABLE t1(c1 int, constraint c1pk PRIMARY KEY(c1))
You can create a primary key one or more columns and an index is automatically created to enforce this primary key constraint.

PRIMARY KEY index
CREATE PRIMARY INDEX ON t1;
CREATE PRIMARY INDEX ip1 ON t1;
Each JSON document you insert into Couchbase has a separate, user-generated, unique per-collection document key that can be up to 250 bytes. The primary key is simply an index on the document keys. The uniqueness is enforced by the collection without the need for the primary key.

SECONDARY index
CREATE INDEX i1 ON t(c1)
CREATE INDEX i1 ON t(c1, c2, c3)
CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC)
These are the workhorses for an OLTP workload. These form the kernel of the index and can be combined with most other features to form sophisticated, sometimes complex indexes to power the workload. This is Luke’s lightsaber.

SECONDARY index
CREATE INDEX i1 ON t(c1)
CREATE INDEX i1 ON t(c1, c2, c3)
CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC)
The secondary indexes are similar to Oracle at a high level. Couchbase indexes do not support reverse scans. If you do need them, you need to specify the DESC order in the index definition. You can define that for each key. Each type of index will be used for all types of range scans, it only makes a difference in performance for order-by query optimization.

PARTIAL index
“Partial Index gives us the ability to create both local and global indexes on only a subset of partitions within a partitioned table. Prior to Oracle 12c, you could not create indexes on selective partitions; Indexes always meant on all of the data.”

PARTIAL INDEX
The concept here is similar to PostgreSQL to Oracle.
You can create indexes on any arbitrary subset of documents and the optimizer will choose the index automatically and when appropriate.
CREATE INDEX i1 ON t(c1) where c2 = “USA”;
CREATE INDEX i2 ON t(c1, c2) where c3 IN [“C”, 23, 24];
CREATE INDEX i3 ON t(c1) WHERE c4 LIKE “xyz%”

FUNCTION based index
Instead of indexing the column value AS-IS, you index the result of a function or an expression on it.
CREATE INDEX i1 ON t(LOWER(c1));
CREATE INDEX i2 ON t(c1 + (c2 * c4));

FUNCTION based index
The functionality is similar.
CREATE INDEX i1 ON t(LOWER(c1));
CREATE INDEX i2 ON t(c1 + (c2 * c4));

PARTITION index
Oracle has the widest functionality and support for partitioning for table and therefore Index: range, list, hash, interval, reference, and all. All of these combined with other features makes it powerful. This is mainly targeted for data warehousing where you’re potentially analyzing large sets of data and “logically pruning” partitions to scan for the query has a significant benefit.

PARTITION index
Couchbase collections are always hash partitioned. By default, the index is global and in a single partition. You’ll have defined the partitioned index as part of CREATE INDEX. Couchbase supports hash partitioned index, but this can be on any arbitrary expression. Just like any hash partitioned object, queries with equality or IN expression on the partitioning key will benefit from partition pruning; others will get the benefit from parallel scans.
For partitioning by range, you’ll have to use partial index syntax and create multiple indexes. Oracle-like range or interval index syntaxes are unavailable.

DOMAIN indexes (Search)
You can index and query text for language awareness, stemming, etc using Oracle text. These indexes are used by the optimizer when you have CONTAINS, CATSEARCH, or MATCHES predicates. Oracle text supports only character types and cannot index numerical or DateTime data types.

Full-Text Search (Search)
Text search is similar to Oracle. Couchbase FTS can index text(string), numbers, booleans, and datetimes making the search usable in a larger number of use cases. Couchbase FTS has an elaborate query language and all of its features can be used by N1QL using the SEARCH() predicate. N1QL also exploits the FTS index for its FLEX indexing optimization for the queries. Flex index is a technology where a single index in FTS can support arbitrarily complex predicates in N1QL. This helps when users are given the flexibility to choose custom predicates for their reports.
Use cases:

Flexible query support: Index on fields (a, b, c) can have predicates (a= 10 and (b = 20 or c between 30 and 40)); It can also have predicates ((a = 10 or c = 20) or (b between 10 and 20); etc, etc.

Dynamic query support: when you create this flex index with (dynamic property set to true), FTS will index all the fields in the document, including the ones unknown at the time of index creation. This takes flexibility to new heights. Index evolves as the schema evolves.

All the cases above support searching in addition to range queries.

In addition to search, FTS allows simple aggregation known as facets.

FTS also helps efficient processing of multiple array predicates because it can index any number of arrays in a single index efficiently.

The full-text search has a fuller functionality to enable modern agile development and effective search.

DOMAIN indexes (Spatial)
These indexes non-tradition types like point, line, polygon, etc with the ability to issues filters like overlaps, contains, and nearest neighbor.

Couchbase SPATIAL index
Couchbase spatial index uses a z-curve data structure and is integrated into FTS indexing and querying. This also enables you to index and query scalars, arrays, text, and spatial using a single index! You can learn more here, here, and here

ARRAY index
Oracle calls this MULTIVALUE index – essentially, the index will have multiple entries pointing to the same doing. A normal index will have one index entry per row.

CREATE MULTIVALUE INDEX mvi_1 ON mytable t
(t.jcol.credit_score.numberOnly());

ARRAY index
The array is THE difference between the relational model and the JSON model. — Gerald Sangudi
As we saw in the data model section, it’s easy to store the array, but difficult to index. Couchbase supports the most generalized array indexing in databases. From simple array indexes on a field, set of fields, and expressions on each one of those. As mentioned above, Using FTS, Couchbase can index multiple array fields in a single index and push down multiple query predicates to the index scan to improve performance.

BITMAP index

Unavailable.

Index High Availability
Oracle allows the creation of multiple indexes with the same index definition. So, if one of the indexes is unavailable (rebuild/etc), the other can be used for queries.

Index High Availability
You can simply create additional copies of the index by specifying the num_replica parameter to CREATE INDEX statement. The query engine automatically distributes the workload between the indexes based on the performance and workload.

Index consistency
Index updates are synchronous and are visible based on the isolation level.

Index consistency:
Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes is always in place. These subtle features are taken for granted in a single system and transactional RDBMS. These options are available in modern distributed systems so applications can trade off consistency for availability and performance.

INDEX ADVISOR
Oracle’s SQL advisor includes an index advisor among other things.

INDEX ADVISOR
Couchbase has to ADVISE statement, ADVISOR function, and advisor service.


Oracle	Couchbase
Index Documentation	Index Documentation
Types of Indexes: Primary & secondary Index (B-tree) Bitmap Index Partial Index Partitioned Index Function-based index Spatial index Search indexes (full-text search)	Types of Indexes: Primary & secondary index (lock-free skiplist) Partial index Partitioned index Functional-key index Array index Flex indexes Search index Spatial index
Indexing data structures B-Tree Bitmap Spatial Inverted tree	Indexing data structures Lock-free skiplist Inverted tree (text) Z-curve (spatial)
CLASSES of INDEXES
Oracle indexes can be large and are managed through bufferpools. Oracle text index can be creed with an in-memory option.	Couchbase has two classes of indexes: Standard secondary index which can be large and relevant entries are paged in based on usage, memory-optimized index, optimized for performance is entirely kept in memory and a write is done to the disk for recovery purposes.
INDEXING FEATURES
Updates; Indexes are updated synchronously. Changes are visible within the transaction (read your own writes).	Updates: The indexes are updated asynchronously. However, within each transaction, changes are visible immediately (read your own writes) for all the access methods (including index scan). Yes, we do magic!
Table Scans For SQL indexes are optional. You need it for improving query latency, throughput, and meeting SLAs. Tables have internal mechanisms to scan the entire table, partitioned or otherwise.	Collection Scans Couchbase collections are hash partitioned distributed table/collection. You can retrieve a document if you have the document key. There isn’t a full scan access method just in the collection. You should build a PRIMARY INDEX that provides the equivalent of a table scan for collection. You can issue arbitrary queries on it.
Index consistency Index updates are synchronous and are visible based on the isolation level.	Index consistency: Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes are always in place. These subtle features are taken for granted in a single system and transactional RDBMS. These options are available in modern distributed systems so applications can trade off consistency for availability and performance.
TYPES OF INDEXES
PRIMARY KEY index CREATE TABLE t1(c1 int primary key) CREATE TABLE t1(c1 int, constraint c1pk PRIMARY KEY(c1)) You can create a primary key one or more columns and an index is automatically created to enforce this primary key constraint.	PRIMARY KEY index CREATE PRIMARY INDEX ON t1; CREATE PRIMARY INDEX ip1 ON t1; Each JSON document you insert into Couchbase has a separate, user-generated, unique per-collection document key that can be up to 250 bytes. The primary key is simply an index on the document keys. The uniqueness is enforced by the collection without the need for the primary key.
SECONDARY index CREATE INDEX i1 ON t(c1) CREATE INDEX i1 ON t(c1, c2, c3) CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC) These are the workhorses for an OLTP workload. These form the kernel of the index and can be combined with most other features to form sophisticated, sometimes complex indexes to power the workload. This is Luke’s lightsaber.	SECONDARY index CREATE INDEX i1 ON t(c1) CREATE INDEX i1 ON t(c1, c2, c3) CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC) The secondary indexes are similar to Oracle at a high level. Couchbase indexes do not support reverse scans. If you do need them, you need to specify the DESC order in the index definition. You can define that for each key. Each type of index will be used for all types of range scans, it only makes a difference in performance for order-by query optimization.
PARTIAL index “Partial Index gives us the ability to create both local and global indexes on only a subset of partitions within a partitioned table. Prior to Oracle 12c, you could not create indexes on selective partitions; Indexes always meant on all of the data.”	PARTIAL INDEX The concept here is similar to PostgreSQL to Oracle. You can create indexes on any arbitrary subset of documents and the optimizer will choose the index automatically and when appropriate. CREATE INDEX i1 ON t(c1) where c2 = “USA”; CREATE INDEX i2 ON t(c1, c2) where c3 IN [“C”, 23, 24]; CREATE INDEX i3 ON t(c1) WHERE c4 LIKE “xyz%”
FUNCTION based index Instead of indexing the column value AS-IS, you index the result of a function or an expression on it. CREATE INDEX i1 ON t(LOWER(c1)); CREATE INDEX i2 ON t(c1 + (c2 * c4));	FUNCTION based index The functionality is similar. CREATE INDEX i1 ON t(LOWER(c1)); CREATE INDEX i2 ON t(c1 + (c2 * c4));
PARTITION index Oracle has the widest functionality and support for partitioning for table and therefore Index: range, list, hash, interval, reference, and all. All of these combined with other features makes it powerful. This is mainly targeted for data warehousing where you’re potentially analyzing large sets of data and “logically pruning” partitions to scan for the query has a significant benefit.	PARTITION index Couchbase collections are always hash partitioned. By default, the index is global and in a single partition. You’ll have defined the partitioned index as part of CREATE INDEX. Couchbase supports hash partitioned index, but this can be on any arbitrary expression. Just like any hash partitioned object, queries with equality or IN expression on the partitioning key will benefit from partition pruning; others will get the benefit from parallel scans. For partitioning by range, you’ll have to use partial index syntax and create multiple indexes. Oracle-like range or interval index syntaxes are unavailable.
DOMAIN indexes (Search) You can index and query text for language awareness, stemming, etc using Oracle text. These indexes are used by the optimizer when you have CONTAINS, CATSEARCH, or MATCHES predicates. Oracle text supports only character types and cannot index numerical or DateTime data types.	Full-Text Search (Search) Text search is similar to Oracle. Couchbase FTS can index text(string), numbers, booleans, and datetimes making the search usable in a larger number of use cases. Couchbase FTS has an elaborate query language and all of its features can be used by N1QL using the SEARCH() predicate. N1QL also exploits the FTS index for its FLEX indexing optimization for the queries. Flex index is a technology where a single index in FTS can support arbitrarily complex predicates in N1QL. This helps when users are given the flexibility to choose custom predicates for their reports. Use cases: Flexible query support: Index on fields (a, b, c) can have predicates (a= 10 and (b = 20 or c between 30 and 40)); It can also have predicates ((a = 10 or c = 20) or (b between 10 and 20); etc, etc. Dynamic query support: when you create this flex index with (dynamic property set to true), FTS will index all the fields in the document, including the ones unknown at the time of index creation. This takes flexibility to new heights. Index evolves as the schema evolves. All the cases above support searching in addition to range queries. In addition to search, FTS allows simple aggregation known as facets. FTS also helps efficient processing of multiple array predicates because it can index any number of arrays in a single index efficiently. The full-text search has a fuller functionality to enable modern agile development and effective search.
DOMAIN indexes (Spatial) These indexes non-tradition types like point, line, polygon, etc with the ability to issues filters like overlaps, contains, and nearest neighbor.	Couchbase SPATIAL index Couchbase spatial index uses a z-curve data structure and is integrated into FTS indexing and querying. This also enables you to index and query scalars, arrays, text, and spatial using a single index! You can learn more here, here, and here
ARRAY index Oracle calls this MULTIVALUE index – essentially, the index will have multiple entries pointing to the same doing. A normal index will have one index entry per row. CREATE MULTIVALUE INDEX mvi_1 ON mytable t (t.jcol.credit_score.numberOnly());	ARRAY index The array is THE difference between the relational model and the JSON model. — Gerald Sangudi As we saw in the data model section, it’s easy to store the array, but difficult to index. Couchbase supports the most generalized array indexing in databases. From simple array indexes on a field, set of fields, and expressions on each one of those. As mentioned above, Using FTS, Couchbase can index multiple array fields in a single index and push down multiple query predicates to the index scan to improve performance.
BITMAP index	Unavailable.
Index High Availability Oracle allows the creation of multiple indexes with the same index definition. So, if one of the indexes is unavailable (rebuild/etc), the other can be used for queries.	Index High Availability You can simply create additional copies of the index by specifying the num_replica parameter to CREATE INDEX statement. The query engine automatically distributes the workload between the indexes based on the performance and workload.
Index consistency Index updates are synchronous and are visible based on the isolation level.	Index consistency: Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes is always in place. These subtle features are taken for granted in a single system and transactional RDBMS. These options are available in modern distributed systems so applications can trade off consistency for availability and performance.
INDEX ADVISOR Oracle’s SQL advisor includes an index advisor among other things.	INDEX ADVISOR Couchbase has to ADVISE statement, ADVISOR function, and advisor service.

With so many types of indexes, the choices for the optimizer are plenty. In the next article, we compare the two optimizers!

Database Relational database Data structure

Published at DZone with permission of Keshav Murthy, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

Introduction to Couchbase for Oracle Developers and Experts: Part 6: Indexing

Today, we're exploring data and data processing issues and interests. SQL, NoSQL, flexible schema, scale-up, scale-out, transactions, and high availability.

"Use the Index, Luke!" -- Source

Overview

Oracle

Couchbase

Index Documentation

Index Documentation

Related

Partner Resources