An Advanced Complex Data Migration Solution

In this article, learn about a data migration solution for cases where Java processing is needed on the records to migrate.

Sven Loesekann

Oct. 09, 23 · Tutorial

Likes (3)

Comment

Save

5.1K Views

In production systems, new features sometimes need a data migration to be implemented. Such a migration can be done with different tools. For simple migrations, SQL can be used. It is fast and easily integrated into Liquibase or other tools to manage database migrations. This solution is for use cases that can not be done in SQL scripts.

The Use Case

The MovieManager project stores the keys to access TheMovieDB in the database. To improve the project, the keys should now be stored encrypted with Tink. The existing keys need to be encrypted during the data migration, and new keys need to be encrypted during the sign-in process. The movie import service needs to decrypt the keys to use them during the import.

The Data Migration

Update the Database Table

To mark migrated rows in the "user1" table, a "migration" column is added in this Liquibase script:

<changeSet id="41" author="angular2guy">
    <addColumn tableName="user1">
        <column defaultValue="0" type="bigint" name="migration"/>
    </addColumn>
</changeSet>

The changeSet adds the "migration" column to the "user1" table and sets the default value "0".

Executing the Data Migration

The data migration is started with the startMigration(...) method in the CronJobs class:

...
private static volatile boolean migrationsDone = false;
...
@Scheduled(initialDelay = 2000, fixedRate = 36000000)
@SchedulerLock(name = "Migrations_scheduledTask", lockAtLeastFor = "PT2H", 
  lockAtMostFor = "PT3H")
public void startMigrations() {
  LOG.info("Start migrations.");
  if (!migrationsDone) {
    this.dataMigrationService.encryptUserKeys().thenApplyAsync(result -> {
      LOG.info("Users migrated: {}", result);
      return result;
    });
  }
  migrationsDone = true;
}

The method startMigrations() is called with the @Scheduled annotation because that enables the use of @SchedulerLock. The @SchedulerLock annotation sets a database lock to limit the execution to one instance to enable horizontal scalability. The startMigrations() method is called 2 seconds after startup and then every hour with the @Scheduled annotation. The encryptUserKeys() method returns a CompletableFuture that enables the use of thenApplyAsync(...) to log the amount of migrated users nonblocking. The static variable migrationsDone makes sure that each application instance calls the dataMigrationService only once and makes the other calls essentially free.

Migrating the Data

To query the Users, the JpaUserRepository has the method findOpenMigrations:

public interface JpaUserRepository extends CrudRepository<User, Long> {
  ...
  @Query("select u from User u where u.migration < :migrationId")
  List<User> findOpenMigrations(@Param(value = "migrationId") 
    Long migrationId);
}

The method searches for entities where the migration property has not been increased to the migrationId that marks them as migrated.

The DataMigrationService contains the encryptUserKeys() method to do the migration:

@Service
@Transactional(propagation = Propagation.REQUIRES_NEW)
public class DataMigrationService {
...    
  @Async
  public CompletableFuture<Long> encryptUserKeys() {
    List<User> migratedUsers = this.userRepository.findOpenMigrations(1L)
      .stream().map(myUser -> {          
        myUser.setUuid(Optional.ofNullable(myUser.getUuid())
         .filter(myStr -> !myStr.isBlank())
         .orElse(UUID.randomUUID().toString()));             
        myUser.setMoviedbkey(this.userDetailService
          .encrypt(myUser.getMoviedbkey(), myUser.getUuid()));
        myUser.setMigration(myUser.getMigration() + 1);
        return myUser;
    }).collect(Collectors.toList());
    this.userRepository.saveAll(migratedUsers);
    return CompletableFuture.completedFuture(
      Integer.valueOf(migratedUsers.size()).longValue());
  }
}

The service has the Propagation.REQUIRES_NEW in the annotation to make sure that each method gets wrapped in its own transaction.

The encryptUserKeys() method has the Async annotation to avoid any timeouts on the calling side. The findOpenMigrations(...) method of the repository returns the not migrated entities and uses map for the migration. In the map it is first checked if the user's UUID is set, or if it is created and set. Then the encrypt(...) method of the UserDetailService is used to encrypt the user key, and the migration property is increased to show that the entity was migrated. The migrated entities are put in a list and saved with the repository. Then the result CompletableFuture is created to return the amount of migrations done. If the migrations are already done, findOpenMigrations(...) returns an empty collection and nothing is mapped or saved.

The UserDetailServiceBase does the encryption in its encrypt() method:

...
@Value("${tink.json.key}")
private String tinkJsonKey;
private DeterministicAead daead;
...
@PostConstruct
public void init() throws GeneralSecurityException {
  DeterministicAeadConfig.register();
  KeysetHandle handle = TinkJsonProtoKeysetFormat.parseKeyset(
    this.tinkJsonKey, InsecureSecretKeyAccess.get());
  this.daead = handle.getPrimitive(DeterministicAead.class);
}
...
public String encrypt(String movieDbKey, String uuid) {
  byte[] cipherBytes;
  try {
    cipherBytes = daead.encryptDeterministically(
       movieDbKey.getBytes(Charset.defaultCharset()),
       uuid.getBytes(Charset.defaultCharset()));
  } catch (GeneralSecurityException e) {
      throw new RuntimeException(e);
  }
  String cipherText = new String(Base64.getEncoder().encode(cipherBytes), 
    Charset.defaultCharset());
  return cipherText;
}

The tinkJsonKey is a secret, and must be injected as an environment variable or Helm chart value into the application for security reasons.
The init() method is annotated with @PostConstruct to run as initialization, and it registers the config and creates the KeysetHandle with the tinkJsonKey. Then the primitive is initialized.
The encrypt(...) method creates the cipherBytes with encryptDeterministcally(...) and the parameters of the method. The UUID is used to have unique cipherBytes for each user. The result is Base64 encoded and returned as String.

Conclusion: Data Migration

This migration needs to run as an application and not as a script. The trade-off is that the migration code is now in the application, and after the migration is run it, is dead code. That code should be removed then, but in the real world, the time to do this is limited and after some time it is forgotten. The alternative is to use something like Spring Batch, but doing that will take more effort and time because the JPA entities/repos can not be reused that easily. A TODO to clean up the method in the DataMigrationService should do the trick sooner or later.

One operations constraint has to be considered: during migration, the database is in an inconsistent state and the user access to the applications should be stopped.

Finally Using the Keys

The MovieService contains the decrypt(...) method:

@Value("${tink.json.key}")
private String tinkJsonKey;
private DeterministicAead daead;
...
@PostConstruct
public void init() throws GeneralSecurityException {
  DeterministicAeadConfig.register();
  KeysetHandle handle = TinkJsonProtoKeysetFormat
    .parseKeyset(this.tinkJsonKey, InsecureSecretKeyAccess.get());
  this.daead = handle.getPrimitive(DeterministicAead.class);
}
...
private String decrypt(String cipherText, String uuid) 
  throws GeneralSecurityException {
  String result = new String(daead.decryptDeterministically(
    Base64.getDecoder().decode(cipherText),
    uuid.getBytes(Charset.defaultCharset())));
  return result;
}

The properties and the init() method are the same as with the encryption. The decrypt(...) method first Base64 decodes the cipherText and then uses the result and the UUID to decrypt the key and return it as a String. That key string is used with the movieDbRestClient methods to import movie data into the database.

Conclusion

The Tink library makes using encryption easy enough. The tinkJsonKey has to be injected at runtime and should not be in a repo file or the application jar. A tinkJsonKey can be created with the EncryptionTest createKeySet(). The ShedLock library enables horizontal scalability, and Spring provides the toolbox that is used. The solution tries to balance the trade-offs for a horizontally scalable data migration that can not be done in a script.

Data migration sql Strings Java (programming language)

Published at DZone with permission of Sven Loesekann. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending