SQL vs NoSQL and SQL to NoSQL Migration
In this article, see a comparison of relational and NoSQL databases and also look at RDBMS to NoSQL migration options.
Join the DZone community and get the full member experience.
Join For FreeOverview
Given the choice of a Relational Database (RDBMS) vs a NoSQL database, it has become more important to select the right type of database for storing data. Not all the requirements fit in a NoSQL database or an RDBMS. RDBMSs are mainly related to managing, storing, and manipulating structured data where the data format, columns, data type, attributes, and schema are fixed, and the relationship between entities needs to be consistently maintained.
SQL is a common query language used when dealing with an RDBMS. Using an RDBMS is a choice for storing transactional data or records where the ACID (Atomicity, Consistency, Isolation, Durability) proprieties of transactions must be provided by an underlying database. An RDBMS is also a choice where the security and accessibility of data are of utmost importance. Typical use cases are financial records, financial transactions, OLTP, ERP, CRM systems, e-commerce applications, etc.
NoSQL (sometimes referred to as Not only SQL, non-SQL or non-relational) is a database that is suitable for managing data that is non-relational, i.e. not structured in tabular format or have fixed data type formats and variables that do not possess tabular relationships. There are various types of NoSQL databases that exist, like key-value, document-based, column-based, and graph-based. When it comes to scalability and performance of unstructured data, NoSQL is the obvious choice.
In a recent development, a few graph databases provide the options to store transactions adhering to ACID properties, but they are still in the early phases of adoption. Typical uses cases of NoSQL include data that is largely unstructured and needs flexibility in data models like content management, personalization, web search engines, storing large users profiles from heterogeneous sources, data streams, documents, digital communication (Storing messages, chats), big data, analytics, machine learning, and storing IoT data.
It becomes imperative to choose the right type of database, and, if required, migrate the exiting RDBMS database to NoSQL to meet the new dynamics of business requirements, scalability, and performance aspects. The below section will help in deciding the right database for your requirement.
1. Database Decision Tree
Requirements |
SQL |
No SQL |
ACID compliance (Atomicity, Consistency, Isolation, Durability) for transaction-based applications |
Preferred option |
Not Suitable Although some databases adhere to CAP theorem (Consistency, Availability, and Partition tolerance), they lack in providing atomicity, and integrity properties |
Security and Integrity |
Preferred if the security of data is of utmost importance. Strong security features integrated into them, masking encryption, etc. Data accuracy and consistency can be maintained |
Preferred if the security of data is not that important (public classified/restricted data). Databases have very few built-in security features, and data integrity is difficult to implement. |
Data Structure |
Preferred option if data is structured, fixed, or changes to the structure are very few or rare and data/records are mainly centralized. Changes to schema may require a change to the entire database, tables, and data updates. |
Preferred option if data is unstructured, frequently changing, and data/records are mainly distributed. Data format and data structure changes are very simple to insert without effecting the entire database. |
Redundancy & Data Normalization |
Prefer databases in normalized schema where redundancy is very rare |
Preferred option if data redundancy is expected |
Entity Relationship, Consistency of Data |
Suitable where the relationship between entities is 'relational,' which must be strictly maintained, and data is strictly consistent all the time. |
Suitable where the relationship between entities is ‘Hierarchical’ in nature. Dynamic or varying and eventual consistency is acceptable |
Performance & Availability |
Performance of read and write operations may be slower compared to NoSQL, however, the execution of complex queries could be better in a SQL database. Most databases support high availability design. |
Better performance (num of read/write operations per sec) for simple queries. Complex queries could have a performance impact. Most databases support high availability design |
Query Handling |
Good fit for complex queries |
Not a good fit for complex queries. Complexity increases in NoSQL query handling |
Scalability |
Horizontal scalability is a challenge and expensive |
Easier to scale |
Bigdata |
Handling big data is expensive and complex for scalability |
Ease of handling big data and low-cost for scalability |
Support |
Vendor & community support |
Mostly community support |
Rapid Development |
Require more time to set up the database. Not suitable for frequent changing data requirements |
Preferred with frequent changes to data structure (ambiguity in data that is getting retrieved from sources and getting stored) |
Use Cases |
Transactional systems, system of records, financial transactions, core banking applications, order management, OLTP, ERP, CRM systems, e-commerce, etc. |
Content management, personalization, search engines, users profiles management and personalization, data streams, document store, digital communication (storing messages, chats), big data, analytics, machine learning, IoT data streams, etc. |
Examples |
MySQL, Oracle, MS-SQL, SQLite, Aurora, Postgres |
MongoDB, CouchDB, Redis, Hbase, Cassandra, AWS DynamoDB, Azure Cosmos DB, AWS Neptune, Neo4j, etc. |
2. Migration From RDBMS Data Sources to NoSQL DBs
If there are existing RDBMS databases that are storing content, documents, files, or have unstructured data, then there are significant advantages in moving such databases to NoSQL databases. Benefits include cost benefits, performance, scalability, future proof for changes, reducing conversion jobs, and extensive supportability for analytics.
Apache Cassandra and MongoDB are the most popularly used NoSQL databases. Atlas is the DBaaS offering MongoDB. However, there is a limited number of RDBMSs to MongoDB or RDBMSs to Cassandra migration tools available in the market. This migration can be achieved using ETL tools or custom transformation Engines. Transformation could be complex. These databases are preferred for greenfield projects than for migration use cases.
There are cloud-provided proprietary databases like Azure CosmosDB, AWS DynamoDB, and Google CloudSpanner, which are actively getting adopted in new architectures due to extensive supportability, documentation, and vendor migration tools available from cloud providers. Also, it requires less effort in setting up and migrating to new DynamoDB or CosmosDB instances as compared to Cassandra or MongoDB. Azure and AWS NoSQL services are more popular as compared to Google.
The below section provides some of the tools options available from cloud service providers to migrate/transform RDBMS to NoSQL databases.
Refer to the reference section for useful links on migration.
2.1 Migration Tools From Traditional RDBMS DB to AWS DynamoDB
AWS Database Migration Service (AWS DMS) can migrate data from most widely-used commercial RDBMSs and open-source databases to similar databases (homogeneous) or different database platforms (heterogeneous) including transforming RDBMS to DynamoDB or Cassandra to DynamoDB or MongoDB to DynamoDB databases.
AWS DMS, along with AWS Schema Conversion Tool (SCT), can help migrate traditional RDBMS data to DynamoDB. The SCT object mapping feature can be used to restructure original data to the target desired structure of the data in DynamoDB during migration. For complex conversions, Lambda functions can be used. Also, there are many open source tools that can convert database tables into JSON objects. These JSON objects can then be imported into NoSQL databases.
AWS DMS can also be used to migrate RDBMS databases to homogenous or heterogeneous RDMS databases including DBaaS platforms in AWS (RDS).
2.2 Migration Tools for RDBMS DB to Azure Cosmos DB
The Azure Cosmos DB emulator and the Azure Cosmos DB Data Migration tool can be used to migrate from an MSSQL database to Cosmos DB. This tool can also help to migrate if the source data is in CSV or JSON object formats.
Azure Database Migration Service (DMS), Cosmos DB, and the API for MongoDB can be used to migrate MongoDB to CosmosDB.
Azure DMS, Cosmos DB, Cassandra API, along with the cqlsh command utility can be used to migrate Cassandra to Cosmos DB.
For conversion from different RDBMS other than MSSQL, Microsoft recommends third-party tools, such as Blitzz.io.
In all the above cases, a Cosmos DB account subscription is required in Azure.
Azure DMS, Azure SQL Server Migration Assistant (SSMA), and Ora2Pg can also be used for homogenous & heterogeneous migrations between relational RDBMS to MSSQL or Azure DBaaS platforms.
2.3 Migration From SQL to Google DataStore
Google has NoSQL services, Cloud Datastore, and Bigtable. Cloud Datastore is now being enhanced to recently released service called Firestore.
There are not many tools and documentation support from Google on migration to Firestore or Bigtable from RDBMS databases or different NoSQL platforms. Cloud SQL, a managed RDBMS from Google, has built-in features to migrate some of traditional RDBMS to CloudSQL. Cloud Spanner is another managed RDBMS. Migrations involve mainly manual processes.
Conclusion
Relational and NoSQL databases are suitable for different types of jobs and storage. Architects have to decide which database type is more suitable for a given type of requirement. Architects have to consider cost factors, HA design, DR recovery mechanisms, and complexities around storing and retrieval of data when making the database designs. NoSQL databases are actively getting used in modern architecture.
Many organizations are migrating traditional large data stores to data lakes consisting of files storage, NoSQL databases, and SQL databases for storing and later used for mining and big data analytics. It's important to include Cloud DB services in target architectures given their implicit stability, performance, availability of new-age tools, analytics solutions, and their cost-effectiveness.
Useful Links
Further Reading
Opinions expressed by DZone contributors are their own.
Comments