Data Governance — A QuickStart With Azure Purview
Using Data effectively is incomplete without the term data governance. Here’s every “Why? How? Where?” you need to know about Data governance and Azure Purview.
Join the DZone community and get the full member experience.
Join For FreeWhen we talk about assets on the balance sheet, Data deserves its row” — Satya Nadella — Microsoft CEO.
As an organization, you have a big question in front of you “How to handle user’s data?”, it can be either used to support your business, or it can be used to give your end-users a better experience.
With enough data and a roadmap to use that data effectively, you can accelerate your company’s growth. Using Data effectively is incomplete without the term data governance. Here’s every “Why? How? Where?” you need to know about Data governance and Azure Purview.
Why Data Governance?
Data is the new currency of the current digital age. But data within organizations is growing at exponential rates. 90% of data today was created in just the last two years. And by 2025, 80% of data will be unstructured data. This influx of data has increased the organization and challenges many folds.
To get real business value from Data, the organization needs to know:
1. What Data exists within the organization?
2. Who owns the Data? Who can access the data?
3. For what purposes can they use the Data responsibly and ethically?
4. Data lineage (traceability of data flow and its usage in solutions)
5. Duplicate data
6. Quality of data and common taxonomy
7. Security and compliance for the data captured
8. Where and How the Data is stored or archived (and overall lifespan of data)
Lack of understanding of any of the above can create operational inefficiencies, confusion related to Data and information being distributed internally and externally, and poor business decisions based on flawed or misunderstood data. Well, that’s only a part of the problem set as regulators are cracking down on companies for any compliance data privacy and data sovereignty (and I won’t be surprised if soon we start seeing regulations around the ethical use of data).
In short, for companies to use data as assets, it would be critical for them to establish the ability to track, manage, and report on data assets, right from inception, processing, storage to consumption, to archival or retention and disposition. To be done pragmatically, companies would need to establish an enterprise data governance program using appropriate technology platforms/solutions to ensure visibility into the organization’s data assets and associated lifecycle of data assets.
What is Data Governance?
According to Gartner, “Data governance is the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics.”
Data governance helps ensure the data is usable, accessible, and protected. It also helps in more informed data analytics because an organization can come to a well-informed conclusion. Data governance also improves the consistency of the data, removes redundancies, and helps make sense of garbage data, which can save an organization from a big decision-making problem.
Data governance also allows organizations with:
- Data consistency.
- Reduced data management costs.
- Increased data access for everyone involved for better data-driven decision-making.
- Improved employee experience (thus higher engagement level and Productivity).
- Improved customer experience by enabling insights into customer behavior/ patterns faster and facilitate 360 views to drive personalized experiences at scale.
- Overall brand value.
What’s Microsoft Azure Purview?
Microsoft Azure Purview is a fully managed, unified data governance service that helps you manage and govern your on-premises, multi-cloud, and SaaS data. Purview creates a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage. Purview empowers data consumers to find valuable, trustworthy data.
It’s built over Apache Atlas, an open-source project for metadata management and governance for data assets. Azure purview also has a data share mechanism that securely shares data with external business partners without setting up extra FTP nodes or creating redundant large datasets. Azure Purview does not move or store customer data out of the region in which it is deployed.
Purview is Available for Public Preview
There is currently no licensing cost associated with Purview; you pay for what you use. The pay-per-use model offered by Microsoft as part of Public Preview is exciting for Microsoft customers looking to move quickly without having to create a business case to secure an additional budget. Azure Purview reduces costs on multiple fronts, including cutting down on manual and custom efforts to discover and classify data and eliminating hidden and explicit costs of maintaining homegrown systems and Excel-based solutions.
Companies Pay no extra cost to scan SQL Servers and Power BI tenants. Azure Purview provisions a storage account and an Azure Event Hubs account as managed resources. This may incur separate charges that will not exceed 2% of charges for scanning in most cases. Customers using Azure Purview to manage Amazon AWS S3 data may face additional charges as part of their Amazon AWS billing due to data transfers and API calls. This charge varies by region.
Data Sources Supported by Azure Purview
It supports the following type of data sources at the time of writing:
1.SQL Server on-premises
2. Azure Data Lake Storage Gen1
3. Azure Data Lake Storage Gen2
4. Azure Blob Storage
5. Azure Data Explorer
6. Azure SQL DB
7. Azure SQL DB Managed Instance
8. Azure Synapse Analytics (formerly SQL DW)
9. Azure Cosmos DB
10. Power BI
11. Teradata
12. ERP sources like SAP S/4 HANA and SAP ECC.
13. Oracle DB as a data source
14. Amazon S3 �� Azure Purview customers can now scan and classify data residing in Amazon AWS S3 with the help of automated scanning, AI-powered built-in and custom classifiers, and Microsoft Information Protection sensitivity labels.
Critical Capabilities of Azure Purview
Azure Purview consists of below main features:
1. Azure Purview Data Map
Azure Purview Data Map provides the foundation for data discovery and effective data governance. It’s a cloud-native PaaS service that captures metadata about enterprise data present in analytics and operation systems on-premises and cloud. Purview Data Map is automatically kept up to date with a built-in automated scanning and classification system. Business users can configure and use the Purview Data Map through an intuitive UI, and developers can programmatically interact with the Data Map using open-source Apache Atlas 2.0 APIs.
Purview Data Map powers the Purview Data Catalog and Purview Data insights as unified experiences within the Purview Studio.
Azure Purview creates an automated system to manage your metadata from hybrid and miscellaneous sources while using built-in data classifiers and data protection to ensure sensitive data is not misused. It does that by using a feature called Microsoft Information Protection sensitive labels.
Data Map extracts metadata, lineage, and classifications from existing data stores. It enables you to enrich your understanding with the help of classifiers at cloud scale classify data using 100+ built-in classifiers and your custom classifiers. With Purview Data Map, organizations can centrally manage, publish and inventory metadata at cloud scale and further extend using Atlas Apache open APIs.
Label-sensitive data feature is supported consistently across the database servers, Azure, Microsoft 365, and Power BI. Along with that lets you easily integrate all your data systems using Apache Atlas Open-source APIs.
2. Purview Data catalog
With Data Catalog, Purview enables rich data discovery with the luxury of searching business & technical terms & understanding data by browsing associated technical, business, semantic, and operational metadata.
The Data Catalog feature of Azure Purview allows you to perform a Semantic search for your Data effortlessly and present it so that understanding it becomes quick and easy while verifying if the data interest originates from a trusted source maintaining the sensitivity of data labels.
Data catalog, along with information on the data source and interactive data lineage visualization, empowers data scientists, engineers, and analysts with business context to drive BI, analytics, AI, and machine learning initiatives.
Purview helps companies to understand their data supply chain from raw data to business insights. From a Data lineage perspective, Purview currently supports:
- Scan your Power BI environment and Azure Synapse Analytics workspaces with a few clicks and automatically publish all discovered assets and lineage to the Purview Data Map.
- Connect Azure Purview to Azure Data Factory instances to automatically collect data integration lineage. Quickly determine which analytics and reports already exist without reinventing the wheel.
3. Purview Data Insights
Using Purview Data Insights, data officers and security officers can get a bird’s eye view and, at a glance, understand what Data is actively scanned, where sensitive data is, and how it moves
The data governance component provides users a bird’s-eye view of your organization’s data landscape; by quickly determining which analytics and reports are stored. It enables stakeholders to maintain and use an organization’s data efficiently if it exists already or not. This view allows you to get crucial insights such as data distribution across environments, how Data is being moved, and where sensitive data is stored.
4. Purview Studio
Purview Studio is essentially an environment created for you to work through the Azure purview services after creating an account. This studio is a central control area that allows developers, administrators, and end-users to work through Purview. This tool is the next step in the process of using Azure Purview.
Challenges of Azure Purview
Azure Purview is in its early days and has few gaps that need to be addressed. Here are few limitations of Azure Purview:
1. Purview has a minimal list of data sources; even most Azure data services are not accessible for scanning, not to mention other extensive management systems and BI tools.
2. User Interface is missing basic data management capabilities in the data catalog. For example, once classified, assets cannot be deleted with the UI.
3. No support for the classification of zip file content.
4. No support for Data Marketplace
5. No support for automation and alerting
6. Relations between assets are set manually, and it’s not possible to specify the type or nature of the relationship.
7. The maximum length of an asset name and classification name is just 4 KB
8. Currently, Azure Purview only provides you with 10GB storage capacity for four capacity unit platforms and 40GB for 16 capacity unit platforms.
While currently, Azure Purview is not a one-shop-stop solution for enterprise-level data governance capabilities but based on the roadmap shared, it won’t be long before the Purview team pull up their socks and cover enough to make Azure Purview an enterprise-grade Data governance suite.
How Azure Purview helps with Data as Asset
Azure purview is there to help you manage your data better and here’s how it’s going to help you process it and convert your data into an asset:
a) Inventory
Azure purview allows you to catalog your data and have a customized tag over it, allowing you, the end-user, to locate better and understand it.
b) Quality Control
It also helps you maintain Data Quality in situations where your data must be complete, unique, valid, accurate, consistent, relevant, reliable, and accessible. Governance tools such as the data catalog will help you with this.
c) Security Compliance
As an organization, it falls on you to provide the utmost security to end-user data. According to government laws and data mandates, the end-users can demand to remove their data from companies severs and even change its content at any given point; Azure Purview lets you create an automated process that will streamline these service requests and produce documentation required by the law.
d) Unified Roadmap
It provides a unified map of your data assets. This helps in forming an effective data governance system.
e) Provides Semantic Search Options
You can run searches based on technical, business, and operational terms. One can identify the sensitivity level of the data and can understand the interactive data lineage.
f) Constant Update of Data Running Through the System
Get continuous updates about the location of the data and continuous insight into its movement through your multi-layer data landscape. Along with this, Azure Purview provides you with services like a Data catalog and Business glossary.
g) Data Catalog
It is a core element of any data governance software, which can scan all the data sources, identify, index, connect and classify registered users’ data sets.
h) Business Glossary
It is a collection of terms with brief definitions which connect to other terms. With Business Glossary, it’s possible to automate the process of classifying the data set and annotate them with correct business terms so end-users can understand them more simply. Any business glossary is the foundation of the semantic layer that an organization uses to define a medium of communication behind its business.
With features like these, Microsoft Azure Purview allows your data to become a crucial asset.
Summary
Data Governance is a must-have solution strategy for all enterprises to use Data as assets. Data Governance is a complex solution yet a foundational pillar in any enterprise’s data journey. Data governance helps to democratize data responsibly through accessible, trusted, and connected enterprise data at scale.
Microsoft Azure Purview provides a good starting point for Cloud-native Data governance solutions. Azure Purview helps answer the who, what, when, how, where, and why of data. From the feature checkpoint of view of Azure Purview, I would say it has the potential to be a game-changer with features like Data catalog, Data insights, Data mapping, Business Glossary, Pipelines to manage your data sources and destinations.
Azure Purview has a solid potential to shape up a new Data Governance as A Service Industry (DGaaS) and open up some new opportunities for businesses to explore.
Opinions expressed by DZone contributors are their own.
Comments