Challenges With Traditional Data Sharing and Emergence of Delta Sharing to the Rescue
This article provides insight into Delta Sharing, how it reduces ELT's complexity, and where it stands, along with other data-sharing solutions.
Join the DZone community and get the full member experience.
Join For FreeWith the increasing number of organizations championing data as a strategic asset and creating financial value from sharing data, sharing of data remained a challenge. While use cases are endless, starting from data monetization strategies in enterprises to data as a service from fleet management to drug discovery and then to real-time public data feeds of environmental data such as climate change or water resources and many others.
And yet, sharing data across different platforms, companies, and clouds is no easy task. Almost all of them lack today’s open-format, multi-cloud, and performance standards.
Databricks Delta Sharing overcomes most of the above problems in its own way. This is the industry’s first-ever open protocol, an open standard for sharing data in a secure manner. Users can then access that data securely within and now between organizations.
Also, it opens the floodgate of sharing and consuming data from external sources allowing collaboration with customers, establishing new partnerships, and hence generating avenues for new revenues.
Where Do Current Data-Sharing Solutions Leave Us?
Commercial DBs/DWHs
Commercial DB and DWH vendors can share data across their systems by installing (and licensing) a new instance of their product. With this approach, you are locked into that vendor’s solution, their restrictions in scale, and their availability on specific cloud platforms (and their pricing).
sFTP
Putting data on an (s)FTP server for data sharing is vendor-agnostic and open source and works across clouds but clearly lacks scalability.
Object Storage URLs
All CSPs allow you to share objects with an URL. You profit from the availability and durability guarantees of object storage. Still, it’s more like low-level storage where files are more like objects, but your data scientists and data engineers want to work with tables and CRUD operations on tables.
What Delta Sharing Brings to the Table for Customers?
Sharing of Real-Time/Batch Data Without Replication
With data physically hosted on cloud storage, Delta sharing facilitates sharing of data from your Lakehouse/data lake without physically copying the data outside your environment, saving substantial egress costs, unlike few Cloud DWH solutions.
Highly Secured, Tracked, and Governed
It allows granting, tracking, and auditing of shared data from a centralized place called Unity Catalog. We can also define how long the recipient can access the in terms of hours, months, days, etc., and eventually, after that, access is revoked automatically.
Scalability
You can share data at any scale by leveraging the underneath cloud storage systems in a more economic and efficient manner.
Support for a Diverse Set of Recipients
The recipient platform can be neutral, i.e., no obligation to be a certain/specific computing platform, i.e., recipients can be another Databricks account in a different region, different cloud provider, or it can be a simple client leveraging APIs from Pandas, Apache Spark, or any BI tools, data science notebooks like Google Colab, Amazon Sagemaker, and many other systems.
How Does It Work?
Delta Sharing is essentially a REST protocol that follows a lake-first approach, so your data stays on the cloud object store with Provider and Recipient as the two main constructs of it.
Data Provider decides what data they want to share and runs a sharing server that implements delta sharing protocol and manages access for Data Recipients. In contrast, recipients consume the share using as delta sharing clients.
Once the request is made by the recipient, the same is validated using the provider token to execute the query from the table.
After validation is complete, the Delta sharing server creates short-lived URLs for the client or data recipient to read the live data that this client has access to from the delta table parallelly at any scale with the consistent tabular view.
Summary
This article provides insight into Delta Sharing and how it reduces the complexity of ELT, and where it stands along with other data-sharing solutions. All these secure and live data sharing capabilities of Delta Sharing promote a scalable and tightly coupled interaction between data providers and consumers within the Lakehouse paradigm.
Opinions expressed by DZone contributors are their own.
Comments