The Power of ShardingSphere With Spring Boot

Learn how to use Apache ShardingSphere with Spring Boot to enable sharding, read-write splitting, and data masking.

Ion Pascari

Apr. 18, 23 · Tutorial

Likes (11)

Comment

Save

10.9K Views

I have developed a small-scale application that concentrates on a straightforward business scenario of customer reviews for the purpose of illustrating various use cases. This application is implemented using the Spring Boot framework and communicates with a MySQL database via Spring Data JPA, with the code being written in Kotlin. It exposes a simple REST API featuring CRUD operations on reviews.

Spoiler alert: The use cases illustrated in this article are intentionally simplistic, intended solely to showcase the integration with ShardingSphere functionalities; the discussed problems here can be solved in various ways and maybe even in better ways, so don't spend too much on thinking "why." So, without further ado, let's dive into code.

Here is the main entity:

     Kotlin 
   
 
 
   @Entity
@Table(name = "reviews")
data class Review(
    var text: String,
    var author: String,
    @Column(name = "author_telephone")
    var authorTelephone: String? = null,
    @Column(name = "author_email")
    var authorEmail: String? = null,
    @Column(name = "invoice_code")
    var invoiceCode: String? = null,
    @Column(name = "course_id")
    var courseId: Int? = null
) : AbstractEntity()
 
  

And we have the following REST API:

     Kotlin 
   
 
 
   @RestController
@RequestMapping("/api/v1/reviews")
class ReviewController(val reviewService: ReviewService) {
    

    @GetMapping("/filter", params = ["author"])
    fun getReviewsByAuthor(@RequestParam("author") author: String): ResponseEntity<List<Review>> {
        val reviews = reviewService.findAllByAuthor(author)
        return ResponseEntity.ok(reviews)
    }

    @GetMapping("/filter", params = ["courseId"])
    fun getReviewsByCourseId(@RequestParam("courseId") courseId: Int): ResponseEntity<List<Review>> {
        val reviews = reviewService.findAllByCourseId(courseId)
        return ResponseEntity.ok(reviews)
    }
    
    @GetMapping("/{id}")
    fun getReviewById(@PathVariable("id") id: Int): ResponseEntity<Review> {
        val review = reviewService.findById(id)
        return ResponseEntity.ok(review)
    }

    @PostMapping
    fun createReview(@RequestBody review: Review): ResponseEntity<Review> {
        val savedReview = reviewService.save(review)
        return ResponseEntity.status(HttpStatus.CREATED).body(savedReview)
    }

    @DeleteMapping("/{id}")
    fun deleteReviewById(@PathVariable("id") id: Int): ResponseEntity<Unit> {
        reviewService.deleteById(id)
        return ResponseEntity.noContent().build()
    }

}
 
  

Here is the MySQL container from docker-compose.yml:

     YAML 
   
 
 
   mysql-master:
  image: 'bitnami/mysql:latest'
  ports:
    - '3306:3306'
  volumes:
    - 'mysql_master_data:/bitnami/mysql/data'
    - ./init.sql:/docker-entrypoint-initdb.d/init.sql
  environment:
    - MYSQL_USER=my_user
    - MYSQL_PASSWORD=my_password
    - MYSQL_DATABASE=reviews-db
 
  

Note: The init.sql contains DDL for the reviews table.

Now if we are to execute some HTTP requests like creating a review (all the mentioned requests are in requests.http).

     Plain Text 
   
 
 
   POST http://localhost:8070/api/v1/reviews/
Content-Type: application/json

{
  "text": "This is a great course!",
  "author": "John Doe",
  "authorTelephone": "555-1234",
  "authorEmail": "johndoe@example.com",
  "invoiceCode": "ABC123",
  "courseId": 123
}
 
  

We'll observe the following query (p6spy enabled):

     Plain Text 
   
   INFO 16784 --- [nio-8070-exec-1] p6spy                                    : #1681730984533 | took 4ms | statement | connection 2| url jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
insert into reviews (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values ('2023-04-17T14:29:44.450+0300', '2023-04-17T14:29:44.450+0300', 'John Doe', 'johndoe@example.com', '555-1234', 123, 'ABC123', 'This is a great course!', 2);

Data Sharding

The reviews application discussed above involves a database storing a large number of reviews for courses. As the number of courses and reviews grows, the reviews table in the database can become very large, making it difficult to manage and slowing down performance. To address this issue, we can implement data sharding, which involves breaking up the reviews table into smaller, more manageable pieces called shards. Each shard contains a subset of the data in the reviews table, with each shard being responsible for a specific range of data based on a shard key, such as the course id.

Table sharding can help the reviews application manage its growing reviews table more effectively, improving performance and scalability while also making backups and maintenance tasks easier to manage.

But, there is always a but – implementing table sharding manually can be a complex and challenging task. It requires a deep understanding of database design and architecture, as well as knowledge of the specific sharding implementation being used. There are a lot of challenges that can arise when implementing table sharding in a Spring Boot application.

It is time we met ShardingSphere:

Apache ShardingSphere is an ecosystem to transform any database into a distributed database system, and enhance it with sharding, elastic scaling, encryption features and more.

Apache ShardingSphere comes in two flavors:

ShardingSphere-JDBC is a lightweight Java framework that provides additional services at Java's JDBC layer.
ShardingSphere-Proxy is a transparent database proxy, providing a database server that encapsulates database binary protocol to support heterogeneous languages.

ShardingSphere provides a range of features, including data sharding, distributed transaction, read/write splitting, high availability, data migration, query federation, data encryption, and shadow database for full-link online load testing scenarios, to help manage large volumes of data and ensure data integrity and security.

For now, we'll focus on ShardingSphere-JDBC's data sharding, and we'll use the following dependencies:

     Kotlin 
   
   implementation("org.apache.shardingsphere:shardingsphere-jdbc-core:5.3.2")
implementation("org.apache.shardingsphere:shardingsphere-cluster-mode-core:5.3.2")
implementation("org.apache.shardingsphere:shardingsphere-cluster-mode-repository-zookeeper:5.3.2")
implementation("org.apache.shardingsphere:shardingsphere-cluster-mode-repository-api:5.3.2")

Note: The ShardingSphere team had starters for spring boot in 5.1.x versions, but they moved away from starters in favor of consistency in their project and now are recommending using the latest version (non-starter), which can be configured a bit differently, but still fairly simple. In my repo, through the commits, you can find examples of spring boot starter configuration too.

ShardingSphere-JDBC can be configured mainly in two ways: YAML configuration and Java configuration. I picked the YAML configuration for this article.

So at the moment, my datasource configuration from application.yaml looks like this:

     YAML 
   
 
 
   spring:
  datasource:
    username: my_user
    password: my_password
    url: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    tomcat:
      validation-query: "SELECT 1"
      test-while-idle: true
  jpa:
    properties:
      hibernate:
        dialect: org.hibernate.dialect.MySQL8Dialect
    open-in-view: false
    hibernate:
      ddl-auto: none
 
  

To enable ShardingSphere JDBC, we'll have to make it look like this:

     YAML 
   
 
 
   spring:
  datasource:
    driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
    url: jdbc:shardingsphere:classpath:sharding.yaml
  jpa:
    properties:
      hibernate:
        dialect: org.hibernate.dialect.MySQL8Dialect
    open-in-view: false
    hibernate:
      ddl-auto: none
 
  

We specified that the driver being used for the data source will be the ShardingSphereDriver and the url should be picked based on this file sharding.yaml

Okay, pretty simple, right? Let's continue; let's create the sharding.yaml

     YAML 
   
 
 
   dataSources:
  master:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1

mode:
  type: Standalone
  repository:
    type: JDBC

rules:
  - !SHARDING
    tables:
      reviews:
        actualDataNodes: master.reviews_$->{0..1}
        tableStrategy:
          standard:
            shardingColumn: course_id
            shardingAlgorithmName: inline
    shardingAlgorithms:
      inline:
        type: INLINE
        props:
          algorithm-expression: reviews_$->{course_id % 2}
          allow-range-query-with-inline-sharding: true
props:
  proxy-hint-enabled: true
  sql-show: true
 
  

Now let's analyze the most important properties:

dataSources.master – here lies the definition of our master data source.
mode – which can be either standalone with JDBC type or cluster with Zookeeper type (recommended for production), which is used for configuration information persistence
rules– here, we can enable various ShardingSphere features like - !SHARDING
1. tables.reviews – here, we describe the actual tables based on the inline syntax rules, meaning that we'll have two tables reviews_0 and reviews_1 sharded by the course_id column.
2. shardingAlgorithms – here, we describe the manual inline sharding algorithm via a groovy expression telling that the reviews table is divided into two tables based on the course_id column.
props – here, we enabled intercepting/formatting sql queries (p6spy can be disabled/commented).

Important: Before starting our application, we need to make sure that our defined shards are created, so I created two tables in my database: reviews_0 and reviews_1 (init.sql).

Now we are ready to start our application and do some requests:

     Plain Text 
   
 
 
   ### POST a new review
POST http://localhost:8070/api/v1/reviews/
Content-Type: application/json

{
  "text": "This is a great course!",
  "author": "John Doe",
  "authorTelephone": "555-1234",
  "authorEmail": "johndoe@example.com",
  "invoiceCode": "ABC123",
  "courseId": 123
}
 
  

We can see the following log:

     Plain Text 
   
   INFO 35412 --- [nio-8070-exec-2] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_1 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 15:42:01.8069745, 2023-04-17 15:42:01.8069745, John Doe, johndoe@example.com, 555-1234, 123, ABC123, This is a great course!, 4]

If we are to execute one more request with a different payload:

     Plain Text 
   
   INFO 35412 --- [nio-8070-exec-8] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_1 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 15:43:47.3267788, 2023-04-17 15:43:47.3267788, Mike Scott, mikescott@example.com, 555-1234, 123, ABC123, This is an amazing course!, 5]

We can notice that both our reviews placed by Mike and John went into reviews_1 table, what if we are to change the course_id to 124 and execute the same POST request again?

     Plain Text 
   
   INFO 35412 --- [nio-8070-exec-4] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_0 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 15:44:42.7133688, 2023-04-17 15:44:42.7133688, Mike Scott, mikescott@example.com, 555-1234, 124, ABC123, This is an amazing course!, 6]

We can see that our new review got saved in reviews_0 table.

Now we can execute two GET requests based on the course_id

     Plain Text 
   
   ### GET reviews by course ID
GET http://localhost:8070/api/v1/reviews/filter?courseId=123

GET http://localhost:8070/api/v1/reviews/filter?courseId=124

And observe in the logs how routing between our two tables took place.

     Plain Text 
   
   INFO 35412 --- [nio-8070-exec-9] ShardingSphere-SQL: Actual SQL: master ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_1 review0_ where review0_.course_id=? ::: [123]
INFO 35412 --- [nio-8070-exec-5] ShardingSphere-SQL: Actual SQL: master ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]

The first select was directed to reviews_1 table and the second one to reviews_0 - Sharding in action!

Read-Write Splitting

Now let's imagine another problem, the reviews application time may experience high stress during peak hours, leading to slow response times and decreased user experience. To address this issue, we can implement read/write splitting to balance the load and improve performance.

And how lucky we are that ShardingSphere offers us a read/write splitting solution. Read-write splitting involves directing read queries to replica databases and write queries to a master database, ensuring that read requests do not interfere with write requests and that database performance is optimized.

Before configuring the read-write splitting solution, we'll have to make some changes to our docker-compose in order to have some replicas for our master db (credits to bitnami for providing this):

     YAML 
   
 
 
   mysql-master:
  image: 'bitnami/mysql:latest'
  ports:
    - '3306:3306'
  volumes:
    - 'mysql_master_data:/bitnami/mysql/data'
    - ./init.sql:/docker-entrypoint-initdb.d/init.sql
  environment:
    - MYSQL_REPLICATION_MODE=master
    - MYSQL_REPLICATION_USER=repl_user
    - MYSQL_REPLICATION_PASSWORD=repl_password
    - MYSQL_ROOT_PASSWORD=master_root_password
    - MYSQL_USER=my_user
    - MYSQL_PASSWORD=my_password
    - MYSQL_DATABASE=reviews-db
mysql-slave:
  image: 'bitnami/mysql:latest'
  ports:
    - '3306'
  depends_on:
    - mysql-master
  environment:
    - MYSQL_USER=my_user
    - MYSQL_PASSWORD=my_password
    - MYSQL_REPLICATION_MODE=slave
    - MYSQL_REPLICATION_USER=repl_user
    - MYSQL_REPLICATION_PASSWORD=repl_password
    - MYSQL_MASTER_HOST=mysql-master
    - MYSQL_MASTER_PORT_NUMBER=3306
    - MYSQL_MASTER_ROOT_PASSWORD=master_root_password
 
  

Let's start our containers like this (one master and two replicas):

docker-compose up --detach --scale mysql-master=1 --scale mysql-slave=2

Now we need the mapped ports for our slaves.

$ docker port infra-mysql-slave-1

3306/tcp -> 0.0.0.0:49923

$ docker port infra-mysql-slave-2

3306/tcp -> 0.0.0.0:49922

Okay, now that we have our master and its replicas in place, we are ready to configure two new dataSources like this:

     YAML 
   
 
 
   dataSources:
  master:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1

  slave0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:49922/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1

  slave1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:49923/reviews-db?allowPublicKeyRetrieval=true&useSSL=false
    username: my_user
    password: my_password
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
    minPoolSize: 1
 
  

And then, we can add the read-write splitting rule to the rules:

     YAML 
   
 
 
   - !READWRITE_SPLITTING
  dataSources:
    readwrite_ds:
      staticStrategy:
        writeDataSourceName: master
        readDataSourceNames:
          - slave0
          - slave1
      loadBalancerName: readwrite-load-balancer
  loadBalancers:
    readwrite-load-balancer:
      type: ROUND_ROBIN
 
  

Here I think everything is self-explanatory: we have specified the written data source name to be the master and the read data sources to point to our slaves: slave0 and slave1; and we picked a round-robin load balancer algorithm.

Important: One last change to be made is regarding the sharding rule, which knows nothing about the newly configured read-write splitting rule and points directly to the master:

     YAML 
   
   - !SHARDING
  tables:
    reviews:
      actualDataNodes: readwrite_ds.reviews_$->{0..1}

Now our sharding we'll be wrapped by the read-write splitting rule, too, and the data source decision will be made before picking the correct table (pay attention to readwrite_ds.reviews_$->{0..1}).

Okay, we can start our application, run the same POST request and observe the logs:

     Plain Text 
   
   INFO 22860 --- [nio-8070-exec-1] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_0 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 16:12:07.25473, 2023-04-17 16:12:07.25473, Mike Scott, mikescott@example.com, 555-1234, 124, ABC123, This is an amazing course!, 7]

Nothing surprising here, the sharding still works, and the query took place in the master data source (write data source). But if we are to run a couple of GET requests, we'll observe the following:

     Plain Text 
   
   INFO 22860 --- [nio-8070-exec-2] ShardingSphere-SQL: Actual SQL: slave0 ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]
INFO 22860 --- [nio-8070-exec-4] ShardingSphere-SQL: Actual SQL: slave1 ::: select review0_.id as id1_0_, review0_.created_at as created_2_0_, review0_.last_modified_at as last_mod3_0_, review0_.author as author4_0_, review0_.author_email as author_e5_0_, review0_.author_telephone as author_t6_0_, review0_.course_id as course_i7_0_, review0_.invoice_code as invoice_8_0_, review0_.text as text9_0_ from reviews_0 review0_ where review0_.course_id=? ::: [124]

You can observe read-write splitting in action; our write queries take place in master data sources, but our read queries take place in master's replicas (slave0 and slave1) and this while maintaining the correct sharding rule.

Data Masking

Off to another imaginary problem regarding our application. Imagine that sensitive information such as customer email, phone number, and invoice code may need to be accessed by certain users or applications while remaining hidden from others due to data privacy regulations.

To address this issue, we could implement a data masking solution to mask sensitive data when mapping the result or at the SQL level. But as you guessed, why bother? ShardingSphere is here to save the day with another easy-to-enable feature – data masking.

So let's update our configuration with this new rule:

     YAML 
   
 
 
   - !MASK
  tables:
    reviews:
      columns:
        invoice_code:
          maskAlgorithm: md5_mask
        author_email:
          maskAlgorithm: mask_before_special_chars_mask
        author_telephone:
          maskAlgorithm: keep_first_n_last_m_mask

  maskAlgorithms:
    md5_mask:
      type: MD5
    mask_before_special_chars_mask:
      type: MASK_BEFORE_SPECIAL_CHARS
      props:
        special-chars: '@'
        replace-char: '*'
    keep_first_n_last_m_mask:
      type: KEEP_FIRST_N_LAST_M
      props:
        first-n: 3
        last-m: 2
        replace-char: '*'
 
  

Let's see what we have here:

table.reviews – we defined three masking algorithms for each column mentioned before
maskAlgorithms.md5_mask – we specified the MD5 algorithm type for invoice_code
maskAlgorithms.mask_before_special_chars_mask – we configured the MASK_BEFORE_SPECIAL_CHARS algorithm for the author_email column, meaning that all the characters before @ symbol will be replaced with the * symbol.
maskAlgorithms.keep_first_n_last_m_mask – we configured the KEEP_FIRST_N_LAST_M algorithm for author_telephone column, meaning that only the first 3 and last 2 characters of a telephone number will stay unchanged; everything in between will be masked by the * symbol.

Note: You can find a lot of other masking algorithms here.

All right, let's start our application and do the same POST request.

     Plain Text 
   
   INFO 35296 --- [nio-8070-exec-1] ShardingSphere-SQL: Actual SQL: master ::: insert into reviews_0 (created_at, last_modified_at, author, author_email, author_telephone, course_id, invoice_code, text, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?) ::: [2023-04-17 16:26:51.8188306, 2023-04-17 16:26:51.8188306, Mike Scott, mikescott@example.com, 555-1234, 124, ABC123, This is an amazing course!, 9]

We'll see nothing new here; the exact values that we provided in the body are the ones that got in the database. You can check the master/slave db, too, for that.

But the magic comes in if we want to execute our GET request, which gives us the following body.

     JSON 
   
 
 
   [
  {
    "text": "This is an amazing course!",
    "author": "Mike Scott",
    "authorTelephone": "555***34",
    "authorEmail": "*********@example.com",
    "invoiceCode": "bbf2dead374654cbb32a917afd236656",
    "courseId": 124,
    "id": 9,
    "lastModifiedAt": "2023-04-17T15:44:43"
  },
]
 
  

As you can see, the data stays unchanged in the database, but when queried and delivered, the telephone, email, and invoice codes got masked according to our defined algorithm in the data masking rule.

Conclusion

That's it for today, folks! I hope this article gave you a good understanding of how ShardingSphere can be used for database sharding, read-write splitting, and data masking with Spring Boot. We covered the basics of ShardingSphere, how to configure it for sharding, how to use read-write splitting, and how to apply data masking to sensitive data. Make sure to check out all the features that ShardingSphere provides and its amazing documentation here.

All the code mentioned in this article can be found here

Happy coding!

Java Database Connectivity MySQL Plain text YAML Spring Boot Data Types

Opinions expressed by DZone contributors are their own.

Related

Trending