Data Fabric vs. Data Lake: Operational Comparison

In this article, we will focus on which is the most appropriate big data store for high-scale, real-time, operational use cases – data fabric vs data lake.

Ian Tick

Oct. 21, 21 · Review

Likes (3)

Comment

Save

9.3K Views

This article will focus on which is the most appropriate big data store for high-scale, real-time, operational use cases – data fabric vs data lake. It will also discuss data warehouses, as well as relational, and non-relational, databases.

What Are Operational Use Cases?

Data-intensive enterprises are driven by a broad array of real-time use cases requiring a high-scale, high-speed data architecture that can support millions of concurrent transactions. Examples include:

360 customer view from many different legacy systems (to a self-service IVR or mobile/web portal, customer service reps, chat agents/bots, and field technicians).
Churn prediction.
Credit scoring.
Fraud prevention.
Payment card transaction security, and more.

Operational Use Case Requirements

Operational use cases need a big data platform capable of performing complex data queries in milliseconds while dealing with:

Live data, which is continually being updated from operational systems (with millions, to billions, of updates each day).
Terabytes of fragmented data, spanning many different databases or tables, typically in different formats and technologies.
A specific instance of a business entity, such as a single customer, product, location, etc.
High concurrency, representing thousands of requests every second.

Big Data Storage Options

Today, the most used storage options that data teams rely on include:

Data Lake

According to an analyst at Gartner, a data lake is a collection of storage instances of various data assets. These assets are stored and maintained as an exact, or near-even exact, replica of the structured or unstructured source format – in addition to the original data stores. Examples of data lake providers include Amazon S3, Apache Hadoop, and Azure Data Lake.

Data Warehouses (DWH)

A data warehouse refers to a storage architecture designed to persist data extracted from operational data stores, transaction systems, and external sources. It combines the data in an aggregated form appropriate for enterprise-wide data analysis and reporting. Examples of DWH providers include Amazon Redshift, Google BigQuery, and Snowflake.

Database Management Systems (DBMS)

A database management system stores and organizes data with defined formats and structures. A DBMS is categorized by its basic structure and by its use or deployment.

A relational DBMS, which usually includes a Structured Query Language (SQL) API, is organized and accessed via the relationships between the data entities. Examples of relational DBMS providers include MS SQL, Oracle, and PostgreSQL.
A non-relational (NoSQL) DBMS is often used in big data and real-time web applications. Although optimized for high-scale use, a non-structured database can’t enforce relationships between data entities. Examples of non-relational DBMS providers include Cassandra, MongoDB, and Redis.

Data Fabric

A data fabric can be defined as an integrated layer of connected data, that's ingested and normalized from an enterprise's data sources – regardless of the data’s format, technology, or source system. It holds the processed data in its own data store, delivering it to big data stores, consuming applications, and AI/ML/real-time decision-making engines – on demand. Examples of data fabric providers include IBM Cloud Pak, K2View, Denodo, Talend and Informatica.

Storage Options – Pros and Cons

The following summarizes the strengths and weaknesses of data fabric vs data lake/DWH, as well as relational, and non-relational, databases.

Data Lake/DWH

Strengths

Support for complex data queries, across structured and unstructured data.

Weaknesses

No support for single entity queries, with resultant slow response times.
No support for live data, so data that needs to be constantly updated is unreliable or delivered at unacceptably slow response times.

Relational Database

Strengths

Support for SQL, broad adoption, and ease of use.

Weaknesses

Non-linear scalability, needing expensive hardware to perform complex queries, on Terabytes of data, in near real-time.
High concurrency, resulting in unacceptably slow response times.

NoSQL Database

Strengths

Distributed data store architecture, with support for linear scalability.

Weaknesses

No support for SQL, needing specialized skills.
In order to support data queries, indexes need to be predefined – or complex application logic needs to be embedded – hampering development agility and time to market.

Operational Data Fabric

Strengths

Full support for SQL.
Distributed data store architecture, with support for linear scalability.
Support for high concurrency, with high performance.
Support for complex queries for single business entities.

Weaknesses

No inherent support for querying across multiple Micro-Databases, but Elasticsearch resolves this issue satisfactorily.

Conclusion

In the data fabric vs data lake comparison, the architecture of choice for real-time operational use cases is obviously data fabric. But data fabric solutions and data lakes are actually complementary in that data fabric can prepare trusted data for data lakes, while data lakes can provide operational intelligence to data fabric for immediate use.

Big data Data lake Relational database Database Comparison (grammar)

Opinions expressed by DZone contributors are their own.

Related

Trending

Data Fabric vs. Data Lake: Operational Comparison

In this article, we will focus on which is the most appropriate big data store for high-scale, real-time, operational use cases – data fabric vs data lake.

What Are Operational Use Cases?

Operational Use Case Requirements

Big Data Storage Options

Data Lake

Data Warehouses (DWH)

Database Management Systems (DBMS)

Data Fabric

Storage Options – Pros and Cons

Data Lake/DWH

Relational Database

NoSQL Database

Operational Data Fabric

Conclusion

Related

Partner Resources