Introduction to Couchbase for Oracle Developers and Experts: Part 6: Indexing
Today, we're exploring data and data processing issues and interests. SQL, NoSQL, flexible schema, scale-up, scale-out, transactions, and high availability.
Join the DZone community and get the full member experience.
Join For FreeHere are the previous articles comparing architecture, database objects, data types, data modeling, and statements and features of Oracle with Couchbase. This post will focus on indexing.
"Use the Index, Luke!" -- Source
Overview
Oracle |
Couchbase |
Index Documentation |
Index Documentation |
Types of Indexes: Primary & secondary Index (B-tree) Bitmap Index Partial Index Partitioned Index Function-based index Spatial index Search indexes (full-text search) |
Types of Indexes: Primary & secondary index (lock-free skiplist) Partial index Partitioned index Functional-key index Array index Flex indexes Search index Spatial index |
Indexing data structures B-Tree Bitmap Spatial Inverted tree |
Indexing data structures Inverted tree (text) Z-curve (spatial) |
CLASSES of INDEXES |
|
Oracle indexes can be large and are managed through bufferpools. Oracle text index can be creed with an in-memory option. |
Couchbase has two classes of indexes: Standard secondary index which can be large and relevant entries are paged in based on usage, memory-optimized index, optimized for performance is entirely kept in memory and a write is done to the disk for recovery purposes. |
INDEXING FEATURES |
|
Updates; Indexes are updated synchronously. Changes are visible within the transaction (read your own writes). |
Updates: The indexes are updated asynchronously. However, within each transaction, changes are visible immediately (read your own writes) for all the access methods (including index scan). Yes, we do magic! |
Table Scans For SQL indexes are optional. You need it for improving query latency, throughput, and meeting SLAs. Tables have internal mechanisms to scan the entire table, partitioned or otherwise. |
Collection Scans Couchbase collections are hash partitioned distributed table/collection. You can retrieve a document if you have the document key. There isn’t a full scan access method just in the collection. You should build a PRIMARY INDEX that provides the equivalent of a table scan for collection. You can issue arbitrary queries on it. |
Index consistency Index updates are synchronous and are visible based on the isolation level. |
Index consistency: Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes are always in place. These subtle features are taken for granted in a single system and transactional RDBMS. These options are available in modern distributed systems so applications can trade off consistency for availability and performance. |
TYPES OF INDEXES |
|
PRIMARY KEY index CREATE TABLE t1(c1 int primary key) CREATE TABLE t1(c1 int, constraint c1pk PRIMARY KEY(c1)) You can create a primary key one or more columns and an index is automatically created to enforce this primary key constraint. |
PRIMARY KEY index CREATE PRIMARY INDEX ON t1; CREATE PRIMARY INDEX ip1 ON t1; Each JSON document you insert into Couchbase has a separate, user-generated, unique per-collection document key that can be up to 250 bytes. The primary key is simply an index on the document keys. The uniqueness is enforced by the collection without the need for the primary key. |
SECONDARY index CREATE INDEX i1 ON t(c1) CREATE INDEX i1 ON t(c1, c2, c3) CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC) These are the workhorses for an OLTP workload. These form the kernel of the index and can be combined with most other features to form sophisticated, sometimes complex indexes to power the workload. This is Luke’s lightsaber. |
SECONDARY index CREATE INDEX i1 ON t(c1) CREATE INDEX i1 ON t(c1, c2, c3) CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC) The secondary indexes are similar to Oracle at a high level. Couchbase indexes do not support reverse scans. If you do need them, you need to specify the DESC order in the index definition. You can define that for each key. Each type of index will be used for all types of range scans, it only makes a difference in performance for order-by query optimization. |
PARTIAL index “Partial Index gives us the ability to create both local and global indexes on only a subset of partitions within a partitioned table. Prior to Oracle 12c, you could not create indexes on selective partitions; Indexes always meant on all of the data.” |
PARTIAL INDEX The concept here is similar to PostgreSQL to Oracle. You can create indexes on any arbitrary subset of documents and the optimizer will choose the index automatically and when appropriate. CREATE INDEX i1 ON t(c1) where c2 = “USA”; CREATE INDEX i2 ON t(c1, c2) where c3 IN [“C”, 23, 24]; CREATE INDEX i3 ON t(c1) WHERE c4 LIKE “xyz%” |
FUNCTION based index Instead of indexing the column value AS-IS, you index the result of a function or an expression on it. CREATE INDEX i1 ON t(LOWER(c1)); CREATE INDEX i2 ON t(c1 + (c2 * c4)); |
FUNCTION based index The functionality is similar. CREATE INDEX i1 ON t(LOWER(c1)); CREATE INDEX i2 ON t(c1 + (c2 * c4)); |
PARTITION index Oracle has the widest functionality and support for partitioning for table and therefore Index: range, list, hash, interval, reference, and all. All of these combined with other features makes it powerful. This is mainly targeted for data warehousing where you’re potentially analyzing large sets of data and “logically pruning” partitions to scan for the query has a significant benefit. |
PARTITION index Couchbase collections are always hash partitioned. By default, the index is global and in a single partition. You’ll have defined the partitioned index as part of CREATE INDEX. Couchbase supports hash partitioned index, but this can be on any arbitrary expression. Just like any hash partitioned object, queries with equality or IN expression on the partitioning key will benefit from partition pruning; others will get the benefit from parallel scans. For partitioning by range, you’ll have to use partial index syntax and create multiple indexes. Oracle-like range or interval index syntaxes are unavailable. |
DOMAIN indexes (Search) You can index and query text for language awareness, stemming, etc using Oracle text. These indexes are used by the optimizer when you have CONTAINS, CATSEARCH, or MATCHES predicates. Oracle text supports only character types and cannot index numerical or DateTime data types. |
Full-Text Search (Search) Text search is similar to Oracle. Couchbase FTS can index text(string), numbers, booleans, and datetimes making the search usable in a larger number of use cases. Couchbase FTS has an elaborate query language and all of its features can be used by N1QL using the SEARCH() predicate. N1QL also exploits the FTS index for its FLEX indexing optimization for the queries. Flex index is a technology where a single index in FTS can support arbitrarily complex predicates in N1QL. This helps when users are given the flexibility to choose custom predicates for their reports. Use cases:
The full-text search has a fuller functionality to enable modern agile development and effective search. |
DOMAIN indexes (Spatial) These indexes non-tradition types like point, line, polygon, etc with the ability to issues filters like overlaps, contains, and nearest neighbor. |
Couchbase SPATIAL index Couchbase spatial index uses a z-curve data structure and is integrated into FTS indexing and querying. This also enables you to index and query scalars, arrays, text, and spatial using a single index! You can learn more here, here, and here |
ARRAY index Oracle calls this MULTIVALUE index – essentially, the index will have multiple entries pointing to the same doing. A normal index will have one index entry per row. CREATE MULTIVALUE INDEX mvi_1 ON mytable t (t.jcol.credit_score.numberOnly()); |
ARRAY index The array is THE difference between the relational model and the JSON model. — Gerald Sangudi As we saw in the data model section, it’s easy to store the array, but difficult to index. Couchbase supports the most generalized array indexing in databases. From simple array indexes on a field, set of fields, and expressions on each one of those. As mentioned above, Using FTS, Couchbase can index multiple array fields in a single index and push down multiple query predicates to the index scan to improve performance.
|
BITMAP index |
Unavailable. |
Index High Availability Oracle allows the creation of multiple indexes with the same index definition. So, if one of the indexes is unavailable (rebuild/etc), the other can be used for queries. |
Index High Availability You can simply create additional copies of the index by specifying the num_replica parameter to CREATE INDEX statement. The query engine automatically distributes the workload between the indexes based on the performance and workload. |
Index consistency Index updates are synchronous and are visible based on the isolation level. |
Index consistency: Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes is always in place. These subtle features are taken for granted in a single system and transactional RDBMS. These options are available in modern distributed systems so applications can trade off consistency for availability and performance. |
INDEX ADVISOR Oracle’s SQL advisor includes an index advisor among other things. |
INDEX ADVISOR Couchbase has to ADVISE statement, ADVISOR function, and advisor service. |
Published at DZone with permission of Keshav Murthy, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments