ShardingSphere's Built-In Metadata Handling Function for Sharded Database Environments
This article discusses using ShardingSphere's metadata management feature to handle metadata in sharded database environments.
Join the DZone community and get the full member experience.
Join For Free
Apache ShardingSphere is a widely recognized and trusted open-source data management platform that provides robust support for key functionalities such as sharding, encryption, read/write splitting, transactions, and high availability. The metadata of ShardingSphere encompasses essential components such as rules, data sources, and table structures, which are fundamental for the smooth operation of the platform. ShardingSphere leverages the advanced capabilities of governance centers like ZooKeeper and etc., for efficient sharing and modification of cluster configurations, enabling seamless horizontal expansion of computing nodes.
In this informative blog post, our emphasis will be on gaining a comprehensive understanding of the metadata structure employed by Apache ShardingSphere. We will delve into the intricacies of the three-layer metadata structure within ZooKeeper, which encompasses crucial components such as metadata information, built-in metadata database, and simulated MySQL database.
Metadata Structure
For a comprehensive grasp of the metadata structure utilized in Apache ShardingSphere, a closer examination of the cluster mode of ShardingSphere-Proxy can be beneficial. The metadata structure in ZooKeeper adopts a three-layer hierarchy, with the first layer being thegovernance_ds
. This layer encompasses critical components such as metadata information, built-in metadata database, and simulated MySQL database.
governance_ds
--metadata (metadata information)
----sharding_db (logical database name)
------active_version (currently active version)
------versions
--------0
----------data_sources (underlying database information)
----------rules (rules of logical database, such as sharding, encryption, etc.)
------schemas (table and view information)
--------sharding_db
----------tables
------------t_order
------------t_single
----------views
----shardingsphere (built-in metadata database)
------schemas
--------shardingsphere
----------tables
------------sharding_table_statics (sharding statistics table)
------------cluster_information (version information)
----performance_schema (simulated MySQL database)
------schemas
--------performance_schema
----------tables
------------accounts
----information_schema (simulated MySQL database)
------schemas
--------information_schema
----------tables
------------tables
------------schemata
------------columns
------------engines
------------routines
------------parameters
------------views
----mysql
----sys
--sys_data (specific row information of built-in metadata database)
----shardingsphere
------schemas
--------shardingsphere
----------tables
------------sharding_table_statistics
--------------79ff60bc40ab09395bed54cfecd08f94
--------------e832393209c9a4e7e117664c5ff8fc61
------------cluster_information
--------------d387c4f7de791e34d206f7dd59e24c1c
The metadata directory serves as a repository for storing essential rules and data source information, including the currently active metadata version, which is stored under the active_version
node. Meanwhile, the versions stored within the metadata directory house different iterations of rules and database connection details.
On the other hand, the
schemas
directory is designated for storing comprehensive tables and viewing information from the logical database. ShardingSphere meticulously preserves the decorated table structure information after applying the rules. For instance, in the case of sharding tables, it retrieves the structure from one of the actual tables, replaces the table name, and omits the real encrypted column information in the table structure, allowing users to conveniently operate on the logical database directly. The built-in metadata database, located within the metadata directory, boasts a structure that resembles that of the logical database. However, it is specifically designed to house certain built-in table structures such as sharding_table_statics
and cluster_information
, which will be elaborated on in subsequent discussions. In addition, the metadata directory also includes other nodes such as performance_schema
, information_schema
, mysql
, sys
, and more, which emulate the data dictionary of MySQL. These nodes serve the purpose of supporting various client tools to connect to the proxy, and future plans involve expanding data collection to facilitate queries on these data dictionaries.
The three-layer metadata structure of ShardingSphere consists of
governance_ds
, metadata
, and a built-in metadata database, is designed to provide compatibility with different database formats. For instance, PostgreSQL has a three-layer structure consisting of instance, database, and schema, whereas MySQL has a two-layer structure of database and table. Therefore, ShardingSphere adds an identical logical schema layer for MySQL to ensure logical uniformity. The meticulously designed three-layer metadata structure of ShardingSphere, encompassing governance_ds
, metadata, and a built-in metadata database, has been formulated to ensure seamless compatibility with diverse database formats. For instance, while PostgreSQL follows a three-layer structure comprising instance, database, and schema, MySQL adopts a two-layer structure of database and table.
To ensure logical uniformity, ShardingSphere introduces an additional logical schema layer for MySQL.Gaining a comprehensive understanding of the metadata structure employed in Apache ShardingSphere is of utmost significance for developers seeking to utilize the platform optimally. By thoroughly examining the metadata structure of ShardingSphere, developers can gain valuable insights into how the platform effectively stores and manages data sources and table structures.
In the preceding section, we examined ShardingSphere's integrated metadata database, encompassing two tables:
sharding_table_statistics
(a table for collecting sharding information) and cluster_information
(a table for storing version information). We also explored the potential of the metadata database to house both internal collection data and user-defined information (yet to be implemented).In this section, we will delve into the inner workings of the built-in metadata database, including its data collection and query implementation mechanisms.
Data Collection
The ShardingSphere's integrated metadata database relies on data collection to aggregate information into memory and synchronizes it with the governance center to ensure consistency across clusters. To illustrate the process of data collection into memory, let's use thesharding_table_statistics
table as an example. The ShardingSphereDataCollector
interface outlines a method for data collection:
Java
public interface ShardingSphereDataCollector extends TypedSPI {
Optional<ShardingSphereTableData> collect(String databaseName, ShardingSphereTable table, Map<String, ShardingSphereDatabase> shardingSphereDatabases) throws SQLException;
}
ShardingSphereDataCollectorRunnable
scheduled task. The current implementation initiates a scheduled task on the Proxy for data collection, utilizing the built-in metadata table to differentiate data collectors for specific data collection tasks. It is worth noting that based on feedback from the community, this approach may evolve into an e-job trigger method for collection in the future. The logic for collecting information is encapsulated in the ShardingStatisticsTableCollector
class. This class employs the underlying data source and sharding rules to query relevant database information and extract statistical data.
Query Implementation
Upon completion of the data collection process, theShardingSphereDataScheduleCollector
class compares the collected information and the data stored in memory. In the event of any inconsistencies, it triggers an event EVENTBUS
to notify the governance center. Subsequently, upon receiving the event, the governance center updates the information of other nodes and executes memory synchronization accordingly. The code for the event listening class is depicted below:
Java
public final class ShardingSphereSchemaDataRegistrySubscriber {
private final ShardingSphereDataPersistService persistService;
private final GlobalLockPersistService lockPersistService;
public ShardingSphereSchemaDataRegistrySubscriber(final ClusterPersistRepository repository, final GlobalLockPersistService globalLockPersistService, final EventBusContext eventBusContext) {
persistService = new ShardingSphereDataPersistService(repository);
lockPersistService = globalLockPersistService;
eventBusContext.register(this);
}
@Subscribe
public void update(final ShardingSphereSchemaDataAlteredEvent event) {
String databaseName = event.getDatabaseName();
String schemaName = event.getSchemaName();
GlobalLockDefinition lockDefinition = new GlobalLockDefinition("sys_data_" + event.getDatabaseName() + event.getSchemaName() + event.getTableName());
if (lockPersistService.tryLock(lockDefinition, 10_000)) {
try {
persistService.getTableRowDataPersistService().persist(databaseName, schemaName, event.getTableName(), event...
sys_data
, direct querying of the built-in metadata database table information via SQL becomes possible. In the upcoming section, we will further discuss the advantages of utilizing the ShardingSphere built-in metadata database and how it can enhance the performance and scalability of your system.
Benefits of Using the ShardingSphere Built-In Metadata Database
The ShardingSphere built-in metadata function provides a powerful tool for managing metadata in a sharded database environment. With this function, users can easily retrieve information on sharded tables and other database objects and further extend the capabilities of their database management systems. By having a comprehensive metadata structure, businesses can gain a deeper understanding of how their databases are functioning and how they can optimize their performance. The built-in metadata database also supports PostgreSQL \d query, which is one of the most commonly used commands in the PG client. To implement the query of \d, it is necessary to implement the corresponding SQL statements and to decorate the data in a certain way, such as replacing sharded tables with logical tables. The actual execution statement of \d is as follows:
SQL
SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind
WHEN 'r' THEN 'table'
WHEN 'v' THEN 'view'
WHEN 'i' THEN 'index'
WHEN 'I' THEN 'global partition index'
WHEN 'S' THEN 'sequence'
WHEN 'L' THEN 'large sequence'
WHEN 'f' THEN 'foreign table'
WHEN 'm' THEN 'materialized view'
WHEN 'e' THEN 'stream'
WHEN 'o' THEN 'contview'
END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
c.reloptions as "Storage"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','v','m','S','L','f','e','o','')
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'db4ai'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND c.relname not like 'matviewmap\_%'
AND c.relname not like 'mlog\_%'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
pg_catalog.pg_class
and pg_catalog.pg_namespace
. In addition, we also need to simulate the return results of the following two functions: pg_catalog.pg_get_userbyid(c.relowner)
and pg_catalog.pg_table_is_visible(c.oid)
.The logic of the collection of tables is similar to the sharding_table_statistics
table mentioned above, so we will not elaborate on it here. Because there is a lot of content in pg_class
, we only collect some of the information related to \d. In addition, during the data collection stage, due to the existence of sharding rules, we need to display logical table names, so further decoration of the collected information is required, such as table name replacement. During the query process, it is necessary to simulate the return results of functions. Fortunately, Calcite offers the capability to register functions, although currently, it is a basic mock and can potentially be expanded into real data in the future.
Java
/**
/**
* Create catalog reader.
*
* @param schemaName schema name
* @param schema schema
* @param relDataTypeFactory rel data type factory
* @param connectionConfig connection config
* @return calcite catalog reader
*/
public static CalciteCatalogReader createCatalogReader(
final String schemaName,
final Schema schema,
final RelDataTypeFactory relDataTypeFactory,
final CalciteConnectionConfig connectionConfig
) {
CalciteSchema rootSchema = CalciteSchema.createRootSchema(true);
rootSchema.add(schemaName, schema);
registryUserDefinedFunction(schemaName, rootSchema.plus());
return new CalciteCatalogReader(
rootSchema,
Database
MySQL
Metadata
Opinions expressed by DZone contributors are their own.
Comments