Hiding Data in DB2
Fine-grained access control is usually done in the database, but a programmable proxy can sometimes be used as an alternative. Learn more in this article.
Join the DZone community and get the full member experience.
Join For FreeIn this article, we'll talk about fine-grained access control in DB2 - hiding data that is, in fact, present in the database, but should not be accessible to certain users.
Fine-grained access control is usually done in the database itself as the data is being accessed, but it can also be done between the database server and the database client using a programmable proxy.
Each of these approaches has its advantages and disadvantages. Let's look at each one and contrast them.
DB2’s Fine-Grained Access Control Capabilities
DB2 has excellent fine-grained access control capabilities - among the best in the RDBMS world.
They come in two flavors:
- Row and column access control (RCAC), which includes row permissions and column masks
- Label-based access control (LBAC), which relies on security labels and security policies
There is some overlap between these two approaches. The biggest difference is that RCAC is code-based (it involves the execution of SQL code to determine whether a piece of data is accessible and how) whereas LBAC is data-based (it relies on declarations and labels, without any code).
Row and Column Access Control (RCAC)
RCAC, as its name suggests, consists of two parts:
- Row permissions, which allow you to define who has access to which rows
- Column masks, which allow you to define how certain columns are presented to the database client
By combining these, you can achieve the most common access control requirements.
Row Permissions
Row permissions are expressed as pieces of code that define which rows are visible. For instance:
CREATE PERMISSION CustomersEuropeOnly ON DEMO.CUSTOMERS
FOR ROWS WHERE
VERIFY_GROUP_FOR_USER(SESSION_USER, 'SALES_EUROPE') AND
COUNTRY IN ('BE', 'DK', 'ES', 'FR', 'GR', 'HN', 'IT')
ENFORCED FOR ALL ACCESS
ENABLE;
This specifies that database users who belong to group SALES_EUROPE
can see the rows in table DEMO.CUSTOMERS
where the COUNTRY
column has one of the specified values. These values could be looked up or computed from the database, and permissions can call SQL functions including secure user-defined functions, so there is quite a bit of flexibility.
Additional permissions can be added to the same table, and DB2 will automatically apply all of them and show only the rows that satisfy at least one permission.
This type of permission covers all access to the table: select, insert, update, and delete, directly or through views. For instance, if the user tries to insert a row that does not satisfy at least one permission, the insert will fail. Same, obviously, for update and delete.
Column Masks
Column masks are also expressed as pieces of code that specify how a column should be presented to the client. For instance:
CREATE MASK CUSTOMERS_LAST_NAME_MASK ON DEMO.CUSTOMERS
FOR COLUMN LAST_NAME RETURN
CASE WHEN (VERIFY_ROLE_FOR_USER(SESSION_USER, 'ENGINEER') AND COUNTRY IN ('BE', 'ES'))
THEN SUBSTR(LAST_NAME, 1, 1) || 'XXXX'
ELSE LAST_NAME
END
ENABLE;
This specifies that, for database users who have the role ENGINEER
, the column LAST_NAME
in table DEMO.CUSTOMERS
should be masked, with just the first character showing, followed by XXXX
, but only for customers in certain countries.
You can only have one mask per column, so things can get painful to manage if you have a lot of logic.
Label-Based Access Control (LBAC)
Unlike permissions and masks in RCAC, which are defined by code, label-based access control is declarative.
It works by creating a system of label components, which can be organized either as unordered sets, e.g.:
CREATE SECURITY LABEL COMPONENT COMPARTMENT
SET {'RESEARCH', 'MANAGEMENT', 'FINANCE'};
As ordered arrays, where higher levels subsume lower levels, e.g.:
CREATE SECURITY LABEL COMPONENT CLASSIFICATION_LEVEL
ARRAY [ 'TOP SECRET', 'SECRET', 'CONFIDENTIAL', 'PUBLIC' ];
Or in trees, where again higher levels subsume lower levels:
CREATE SECURITY LABEL COMPONENT REGION
TREE (
'WORLD' ROOT,
'AMERICAS' UNDER 'WORLD',
'CANADA' UNDER 'AMERICAS',
'USA' UNDER 'AMERICAS',
'CALIFORNIA' UNDER 'USA',
'TEXAS' UNDER 'USA',
etc...
This is quite powerful, though managing all these can quickly become a challenge.
Label components can then be assembled into labels, e.g.:
CREATE SECURITY LABEL POLICY2024.RESEARCH_TOPSECRET_USA
COMPONENT COMPARTMENT 'RESEARCH',
COMPONENT CLASSIFICATION_LEVEL 'TOP SECRET',
COMPONENT REGION 'USA';
These labels can then be used to protect individual columns:
CREATE TABLE DEMO.CUSTOMERS (
ID INT NOT NULL,
FIRST_NAME NVARCHAR(100) NOT NULL,
LAST_NAME NVARCHAR(100) NOT NULL,
COUNTRY CHAR(2) NOT NULL,
BALANCE NUMERIC(12,2) NOT NULL SECURED WITH FINANCE_CONFIDENTIAL_WORLD,
SECLABEL DB2SECURITYLABEL NOT NULL,
PRIMARY KEY (ID)
)
SECURITY POLICY POLICY2024;
As well as individual rows:
INSERT INTO DEMO.CUSTOMERS(ID, FIRST_NAME, LAST_NAME, COUNTRY, SECLABEL) VALUES
(21, 'Wernher', 'von Braun', 'US', SECLABEL_BY_NAME('POLICY2024', 'RESEARCH_TOPSECRET_USA'));
Finally, users, roles, and groups can be given access to the labels for reading, writing, or both:
GRANT SECURITY LABEL POLICY2024.RESEARCH_TOPSECRET_USA
TO USER TJONES FOR READ ACCESS;
There is even more to LBAC, such as an exemption mechanism.
This is an elaborate system, which has complex rules governing things like security upgrades, default behaviors, backups and data movements, and everything else security auditors like to poke around. It is powerful enough to implement the vast majority of fine-grained access control needs, though scaling this system to cover large, complex requirements with many dimensions can be a significant challenge to design, implement, and manage.
Pros and Cons of DB2's Fine-Grained Access Control
Pros
- Built into the database: No need for any other software, other than perhaps something to help manage all that security metadata
- Rock-solid and certified: This is in use by many of the most security-conscious organizations around the world.
- Automatically governs all relevant queries and updates
Cons
- Adds an additional burden on the database: Most of these mechanisms are actually implemented by merging them into the query, which can result in some very complex (and expensive) queries if you have a lot of masks, permissions, and labels.
- Does not cover certain data types, such as XML and LOBs
- If a permission becomes invalid (which can easily happen in the case of complex permissions), it will block all access to the table for which it was defined. This can obviously be disruptive, and non-trivial to debug.
- All these labels and permissions must be managed by a database user with the SECADM authority.
Many of the advantages and disadvantages of this solution come from the fact that it's a centralized solution, which is great for some scenarios, and less great for others.
Proxy-Based Fine-Grained Access Control
A very different approach consists of restricting data access between the database server and the database client, using a programmable proxy.
This is a more decentralized approach: some trusted clients may access the database directly, whereas some less trusted clients may have to go through the proxy. And of course, you may have different proxies for different types of clients.
Because this kind of access control is done outside of the database, it is a less powerful approach in some ways, but it does have certain advantages.
Restricting or Rewriting Queries
A proxy can intercept any SQL command on its way to the database and potentially change it or reject it outright.
For instance, the proxy can simply reject certain queries based on the application user, the data being accessed, the time of day, etc.
Going further, the proxy can rewrite queries so that they satisfy our data-hiding requirements. This rewriting can of course vary depending on who the user is, where the call is coming from, or any other useful conditions.
For instance, a proxy can trivially intercept a query like:
SELECT ID, FIRST_NAME, LAST_NAME, BALANCE
FROM DEMO.CUSTOMERS
WHERE COUNTRY = 'FR'
And rewrite it to:
SELECT ID, FIRST_NAME, SUBSTR(LAST_NAME, 1, 1) || 'XXXX' AS LAST_NAME, 0 AS BALANCE
FROM DEMO.CUSTOMERS
WHERE COUNTRY = 'FR' AND REGION <> 'Z'
This all happens before it gets sent to the database. This approach works best for known queries, although it is often possible to analyze the query dynamically and modify it as needed.
Modifying Result Sets
When the database responds to a query, the result sets or result values go through the proxy, which can manipulate them as needed.
For example, using Gallium Data as a proxy, a trivial result set filter might look like:
if (row.COUNTRY === 'FR') {
row.LAST_NAME = row.LAST_NAME.substring(0, 1) + "XXXX";
row.BALANCE = 0;
}
This extra processing will add a small amount of latency, but that work needs to be done somewhere, and by doing it in the proxy, you're saving the database from having to do it.
The proxy has complete control over result sets and result values: it can modify rows, remove entire rows, or even insert new rows into a result set.
The Pros and Cons of Proxy-Based Data Hiding
Pros
- Requires no changes to the database, the database server, or the clients. This is clearly the most attractive feature of this approach: nothing has to change in your existing system.
- Puts no extra load on the database
- Can cover stored procedure execution, including parameter values and returned result sets
- Can be modulated based on the database user, the application user, the application's behavior, or any other factors
- Different proxies can enforce different sets of constraints.
- The constraints are administered outside of DB2 and require no special permissions in the database.
Cons
- Slight performance cost (typically on the order of 5%-10% increased latency)
- Limited in the case of free-form queries, since the proxy will never understand the queries as deeply as the database can
- Cannot entirely hide some data (e.g., aggregate data can reveal the existence and value of some hidden data).
Conclusion
The easiest way to look at this is to ask yourself what will best serve your requirements: a centralized solution, or a decentralized solution?
- If you're ok managing all your access control in DB2, then you should clearly go that way: it's extremely solid and flexible, and it's all built-in.
- If you'd rather not change your database, or DB2's access control does not fulfill your requirements, then using a proxy may be the easiest route - sometimes the only route.
Published at DZone with permission of Max Tardiveau. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments