ShardingSphere's Built-In Metadata Handling Function for Sharded Database Environments

This article discusses using ShardingSphere's metadata management feature to handle metadata in sharded database environments.

May. 08, 23 · Tutorial

Likes (3)

Comment

Save

2.0K Views

Apache ShardingSphere is a widely recognized and trusted open-source data management platform that provides robust support for key functionalities such as sharding, encryption, read/write splitting, transactions, and high availability. The metadata of ShardingSphere encompasses essential components such as rules, data sources, and table structures, which are fundamental for the smooth operation of the platform. ShardingSphere leverages the advanced capabilities of governance centers like ZooKeeper and etc., for efficient sharing and modification of cluster configurations, enabling seamless horizontal expansion of computing nodes.

In this informative blog post, our emphasis will be on gaining a comprehensive understanding of the metadata structure employed by Apache ShardingSphere. We will delve into the intricacies of the three-layer metadata structure within ZooKeeper, which encompasses crucial components such as metadata information, built-in metadata database, and simulated MySQL database.

Metadata Structure

For a comprehensive grasp of the metadata structure utilized in Apache ShardingSphere, a closer examination of the cluster mode of ShardingSphere-Proxy can be beneficial. The metadata structure in ZooKeeper adopts a three-layer hierarchy, with the first layer being the governance_ds. This layer encompasses critical components such as metadata information, built-in metadata database, and simulated MySQL database.

governance_ds
--metadata (metadata information)
----sharding_db (logical database name)
------active_version (currently active version)
------versions
--------0
----------data_sources (underlying database information)
----------rules (rules of logical database, such as sharding, encryption, etc.)
------schemas (table and view information)
--------sharding_db
----------tables
------------t_order
------------t_single
----------views
----shardingsphere (built-in metadata database)
------schemas
--------shardingsphere
----------tables
------------sharding_table_statics (sharding statistics table)
------------cluster_information （version information）
----performance_schema (simulated MySQL database)
------schemas
--------performance_schema
----------tables
------------accounts
----information_schema (simulated MySQL database)
------schemas
--------information_schema
----------tables
------------tables
------------schemata
------------columns
------------engines
------------routines
------------parameters
------------views
----mysql
----sys
--sys_data (specific row information of built-in metadata database)
----shardingsphere
------schemas
--------shardingsphere
----------tables
------------sharding_table_statistics
--------------79ff60bc40ab09395bed54cfecd08f94
--------------e832393209c9a4e7e117664c5ff8fc61
------------cluster_information
--------------d387c4f7de791e34d206f7dd59e24c1c

The metadata directory serves as a repository for storing essential rules and data source information, including the currently active metadata version, which is stored under the active_version node. Meanwhile, the versions stored within the metadata directory house different iterations of rules and database connection details.

On the other hand, the schemas directory is designated for storing comprehensive tables and viewing information from the logical database. ShardingSphere meticulously preserves the decorated table structure information after applying the rules. For instance, in the case of sharding tables, it retrieves the structure from one of the actual tables, replaces the table name, and omits the real encrypted column information in the table structure, allowing users to conveniently operate on the logical database directly. The built-in metadata database, located within the metadata directory, boasts a structure that resembles that of the logical database. However, it is specifically designed to house certain built-in table structures such as sharding_table_statics and cluster_information, which will be elaborated on in subsequent discussions. In addition, the metadata directory also includes other nodes such as performance_schema, information_schema, mysql, sys, and more, which emulate the data dictionary of MySQL. These nodes serve the purpose of supporting various client tools to connect to the proxy, and future plans involve expanding data collection to facilitate queries on these data dictionaries.

The three-layer metadata structure of ShardingSphere consists of governance_ds, metadata, and a built-in metadata database, is designed to provide compatibility with different database formats. For instance, PostgreSQL has a three-layer structure consisting of instance, database, and schema, whereas MySQL has a two-layer structure of database and table. Therefore, ShardingSphere adds an identical logical schema layer for MySQL to ensure logical uniformity. The meticulously designed three-layer metadata structure of ShardingSphere, encompassing governance_ds, metadata, and a built-in metadata database, has been formulated to ensure seamless compatibility with diverse database formats. For instance, while PostgreSQL follows a three-layer structure comprising instance, database, and schema, MySQL adopts a two-layer structure of database and table.

To ensure logical uniformity, ShardingSphere introduces an additional logical schema layer for MySQL.Gaining a comprehensive understanding of the metadata structure employed in Apache ShardingSphere is of utmost significance for developers seeking to utilize the platform optimally. By thoroughly examining the metadata structure of ShardingSphere, developers can gain valuable insights into how the platform effectively stores and manages data sources and table structures.

In the preceding section, we examined ShardingSphere's integrated metadata database, encompassing two tables: sharding_table_statistics (a table for collecting sharding information) and cluster_information(a table for storing version information). We also explored the potential of the metadata database to house both internal collection data and user-defined information (yet to be implemented).In this section, we will delve into the inner workings of the built-in metadata database, including its data collection and query implementation mechanisms.

Data Collection

The ShardingSphere's integrated metadata database relies on data collection to aggregate information into memory and synchronizes it with the governance center to ensure consistency across clusters. To illustrate the process of data collection into memory, let's use the sharding_table_statistics table as an example. The ShardingSphereDataCollectorinterface outlines a method for data collection:

     Java
    
    public interface ShardingSphereDataCollector extends TypedSPI {
    Optional<ShardingSphereTableData> collect(String databaseName, ShardingSphereTable table, Map<String, ShardingSphereDatabase> shardingSphereDatabases) throws SQLException;
}

The aforementioned method is invoked by the ShardingSphereDataCollectorRunnable scheduled task. The current implementation initiates a scheduled task on the Proxy for data collection, utilizing the built-in metadata table to differentiate data collectors for specific data collection tasks. It is worth noting that based on feedback from the community, this approach may evolve into an e-job trigger method for collection in the future. The logic for collecting information is encapsulated in the ShardingStatisticsTableCollectorclass. This class employs the underlying data source and sharding rules to query relevant database information and extract statistical data.

Query Implementation

Upon completion of the data collection process, the ShardingSphereDataScheduleCollector class compares the collected information and the data stored in memory. In the event of any inconsistencies, it triggers an event EVENTBUSto notify the governance center. Subsequently, upon receiving the event, the governance center updates the information of other nodes and executes memory synchronization accordingly. The code for the event listening class is depicted below:

     Java
    
 

    public final class ShardingSphereSchemaDataRegistrySubscriber {
    
    private final ShardingSphereDataPersistService persistService;
    
    private final GlobalLockPersistService lockPersistService;
    
    public ShardingSphereSchemaDataRegistrySubscriber(final ClusterPersistRepository repository, final GlobalLockPersistService globalLockPersistService, final EventBusContext eventBusContext) {
        persistService = new ShardingSphereDataPersistService(repository);
        lockPersistService = globalLockPersistService;
        eventBusContext.register(this);
    }
    
    @Subscribe
    public void update(final ShardingSphereSchemaDataAlteredEvent event) {
        String databaseName = event.getDatabaseName();
        String schemaName = event.getSchemaName();
        GlobalLockDefinition lockDefinition = new GlobalLockDefinition("sys_data_" + event.getDatabaseName() + event.getSchemaName() + event.getTableName());
        if (lockPersistService.tryLock(lockDefinition, 10_000)) {
            try {
                persistService.getTableRowDataPersistService().persist(databaseName, schemaName, event.getTableName(), event...
   

In the preceding section, we have delved into the inner workings of ShardingSphere's integrated metadata database, encompassing data collection and query implementation. By storing table structures in metadata and table content in sys_data, direct querying of the built-in metadata database table information via SQL becomes possible. In the upcoming section, we will further discuss the advantages of utilizing the ShardingSphere built-in metadata database and how it can enhance the performance and scalability of your system.

Benefits of Using the ShardingSphere Built-In Metadata Database

The ShardingSphere built-in metadata function provides a powerful tool for managing metadata in a sharded database environment. With this function, users can easily retrieve information on sharded tables and other database objects and further extend the capabilities of their database management systems. By having a comprehensive metadata structure, businesses can gain a deeper understanding of how their databases are functioning and how they can optimize their performance. The built-in metadata database also supports PostgreSQL \d query, which is one of the most commonly used commands in the PG client. To implement the query of \d, it is necessary to implement the corresponding SQL statements and to decorate the data in a certain way, such as replacing sharded tables with logical tables. The actual execution statement of \d is as follows:

     SQL
    
 

    SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind 
    WHEN 'r' THEN 'table' 
    WHEN 'v' THEN 'view' 
    WHEN 'i' THEN 'index' 
    WHEN 'I' THEN 'global partition index' 
    WHEN 'S' THEN 'sequence' 
    WHEN 'L' THEN 'large sequence' 
    WHEN 'f' THEN 'foreign table' 
    WHEN 'm' THEN 'materialized view'  
    WHEN 'e' THEN 'stream' 
    WHEN 'o' THEN 'contview' 
  END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
  c.reloptions as "Storage"
FROM pg_catalog.pg_class c
  LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','v','m','S','L','f','e','o','')
  AND n.nspname <> 'pg_catalog' 
  AND n.nspname <> 'db4ai' 
  AND n.nspname <> 'information_schema'
  AND n.nspname !~ '^pg_toast'
  AND c.relname not like 'matviewmap\_%'
  AND c.relname not like 'mlog\_%'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
   

To implement the query of this statement, we need to collect information from the two tables pg_catalog.pg_class and pg_catalog.pg_namespace. In addition, we also need to simulate the return results of the following two functions: pg_catalog.pg_get_userbyid(c.relowner) and pg_catalog.pg_table_is_visible(c.oid).The logic of the collection of tables is similar to the sharding_table_statistics table mentioned above, so we will not elaborate on it here. Because there is a lot of content in pg_class, we only collect some of the information related to \d. In addition, during the data collection stage, due to the existence of sharding rules, we need to display logical table names, so further decoration of the collected information is required, such as table name replacement. During the query process, it is necessary to simulate the return results of functions. Fortunately, Calcite offers the capability to register functions, although currently, it is a basic mock and can potentially be expanded into real data in the future.

     Java
    
 

    /**
    /**
 * Create catalog reader.
 *
 * @param schemaName schema name
 * @param schema schema
 * @param relDataTypeFactory rel data type factory
 * @param connectionConfig connection config
 * @return calcite catalog reader
 */
public static CalciteCatalogReader createCatalogReader(
  final String schemaName, 
  final Schema schema, 
  final RelDataTypeFactory relDataTypeFactory, 
  final CalciteConnectionConfig connectionConfig
) {
  CalciteSchema rootSchema = CalciteSchema.createRootSchema(true);
  rootSchema.add(schemaName, schema);
  registryUserDefinedFunction(schemaName, rootSchema.plus());
  return new CalciteCatalogReader(
    rootSchema, 
   

In summary, the built-in metadata function of ShardingSphere offers a powerful tool for effective metadata management in a sharded database environment. This functionality allows users to effortlessly retrieve information on sharded tables and other database objects and further enhance the capabilities of their database management systems. While this feature is currently in the experimental stage, it demonstrates significant potential for future development and improvement. We encourage users to actively explore and contribute to the ShardingSphere community, as together, we can continue to advance the capabilities of metadata management in sharded database environments.

Database MySQL Metadata

Opinions expressed by DZone contributors are their own.

Related

Trending