An Advanced Complex Data Migration Solution
In this article, learn about a data migration solution for cases where Java processing is needed on the records to migrate.
Join the DZone community and get the full member experience.
Join For FreeIn production systems, new features sometimes need a data migration to be implemented. Such a migration can be done with different tools. For simple migrations, SQL can be used. It is fast and easily integrated into Liquibase or other tools to manage database migrations. This solution is for use cases that can not be done in SQL scripts.
The Use Case
The MovieManager project stores the keys to access TheMovieDB in the database. To improve the project, the keys should now be stored encrypted with Tink. The existing keys need to be encrypted during the data migration, and new keys need to be encrypted during the sign-in process. The movie import service needs to decrypt the keys to use them during the import.
The Data Migration
Update the Database Table
To mark migrated rows in the "user1"
table, a "migration"
column is added in this Liquibase script:
<changeSet id="41" author="angular2guy"> <addColumn tableName="user1"> <column defaultValue="0" type="bigint" name="migration"/> </addColumn> </changeSet>
The changeSet
adds the "migration"
column to the "user1
" table and sets the default value "0"
.
Executing the Data Migration
The data migration is started with the startMigration(...)
method in the CronJobs class:
...
private static volatile boolean migrationsDone = false;
...
@Scheduled(initialDelay = 2000, fixedRate = 36000000)
@SchedulerLock(name = "Migrations_scheduledTask", lockAtLeastFor = "PT2H",
lockAtMostFor = "PT3H")
public void startMigrations() {
LOG.info("Start migrations.");
if (!migrationsDone) {
this.dataMigrationService.encryptUserKeys().thenApplyAsync(result -> {
LOG.info("Users migrated: {}", result);
return result;
});
}
migrationsDone = true;
}
The method startMigrations()
is called with the @Scheduled
annotation because that enables the use of @SchedulerLock
. The @SchedulerLock
annotation sets a database lock to limit the execution to one instance to enable horizontal scalability. The startMigrations()
method is called 2 seconds after startup and then every hour with the @Scheduled
annotation. The encryptUserKeys()
method returns a CompletableFuture
that enables the use of thenApplyAsync(...)
to log the amount of migrated users nonblocking. The static variable migrationsDone
makes sure that each application instance calls the dataMigrationService
only once and makes the other calls essentially free.
Migrating the Data
To query the Users
, the JpaUserRepository has the method findOpenMigrations
:
public interface JpaUserRepository extends CrudRepository<User, Long> {
...
@Query("select u from User u where u.migration < :migrationId")
List<User> findOpenMigrations(@Param(value = "migrationId")
Long migrationId);
}
The method searches for entities where the migration property has not been increased to the migrationId
that marks them as migrated.
The DataMigrationService contains the encryptUserKeys()
method to do the migration:
@Service
@Transactional(propagation = Propagation.REQUIRES_NEW)
public class DataMigrationService {
...
@Async
public CompletableFuture<Long> encryptUserKeys() {
List<User> migratedUsers = this.userRepository.findOpenMigrations(1L)
.stream().map(myUser -> {
myUser.setUuid(Optional.ofNullable(myUser.getUuid())
.filter(myStr -> !myStr.isBlank())
.orElse(UUID.randomUUID().toString()));
myUser.setMoviedbkey(this.userDetailService
.encrypt(myUser.getMoviedbkey(), myUser.getUuid()));
myUser.setMigration(myUser.getMigration() + 1);
return myUser;
}).collect(Collectors.toList());
this.userRepository.saveAll(migratedUsers);
return CompletableFuture.completedFuture(
Integer.valueOf(migratedUsers.size()).longValue());
}
}
The service has the Propagation.REQUIRES_NEW
in the annotation to make sure that each method gets wrapped in its own transaction.
The encryptUserKeys()
method has the Async
annotation to avoid any timeouts on the calling side. The findOpenMigrations(...)
method of the repository returns the not migrated entities and uses map
for the migration. In the map
it is first checked if the user's UUID
is set, or if it is created and set. Then the encrypt(...)
method of the UserDetailService is used to encrypt the user key, and the migration
property is increased to show that the entity was migrated. The migrated entities are put in a list and saved with the repository. Then the result CompletableFuture
is created to return the amount of migrations done. If the migrations are already done, findOpenMigrations(...)
returns an empty collection and nothing is mapped or saved.
The UserDetailServiceBase does the encryption in its encrypt()
method:
...
@Value("${tink.json.key}")
private String tinkJsonKey;
private DeterministicAead daead;
...
@PostConstruct
public void init() throws GeneralSecurityException {
DeterministicAeadConfig.register();
KeysetHandle handle = TinkJsonProtoKeysetFormat.parseKeyset(
this.tinkJsonKey, InsecureSecretKeyAccess.get());
this.daead = handle.getPrimitive(DeterministicAead.class);
}
...
public String encrypt(String movieDbKey, String uuid) {
byte[] cipherBytes;
try {
cipherBytes = daead.encryptDeterministically(
movieDbKey.getBytes(Charset.defaultCharset()),
uuid.getBytes(Charset.defaultCharset()));
} catch (GeneralSecurityException e) {
throw new RuntimeException(e);
}
String cipherText = new String(Base64.getEncoder().encode(cipherBytes),
Charset.defaultCharset());
return cipherText;
}
- The
tinkJsonKey
is a secret, and must be injected as an environment variable or Helm chart value into the application for security reasons. - The
init()
method is annotated with@PostConstruct
to run as initialization, and it registers the config and creates theKeysetHandle
with thetinkJsonKey
. Then the primitive is initialized. - The
encrypt(...)
method creates thecipherBytes
withencryptDeterministcally(...)
and the parameters of the method. TheUUID
is used to have uniquecipherBytes
for each user. The result is Base64 encoded and returned asString
.
Conclusion: Data Migration
This migration needs to run as an application and not as a script. The trade-off is that the migration code is now in the application, and after the migration is run it, is dead code. That code should be removed then, but in the real world, the time to do this is limited and after some time it is forgotten. The alternative is to use something like Spring Batch, but doing that will take more effort and time because the JPA entities/repos can not be reused that easily. A TODO to clean up the method in the DataMigrationService
should do the trick sooner or later.
One operations constraint has to be considered: during migration, the database is in an inconsistent state and the user access to the applications should be stopped.
Finally Using the Keys
The MovieService contains the decrypt(...)
method:
@Value("${tink.json.key}")
private String tinkJsonKey;
private DeterministicAead daead;
...
@PostConstruct
public void init() throws GeneralSecurityException {
DeterministicAeadConfig.register();
KeysetHandle handle = TinkJsonProtoKeysetFormat
.parseKeyset(this.tinkJsonKey, InsecureSecretKeyAccess.get());
this.daead = handle.getPrimitive(DeterministicAead.class);
}
...
private String decrypt(String cipherText, String uuid)
throws GeneralSecurityException {
String result = new String(daead.decryptDeterministically(
Base64.getDecoder().decode(cipherText),
uuid.getBytes(Charset.defaultCharset())));
return result;
}
The properties and the init()
method are the same as with the encryption. The decrypt(...)
method first Base64 decodes the cipherText
and then uses the result and the UUID
to decrypt the key and return it as a String
. That key string is used with the movieDbRestClient
methods to import movie data into the database.
Conclusion
The Tink library makes using encryption easy enough. The tinkJsonKey
has to be injected at runtime and should not be in a repo file or the application jar. A tinkJsonKey
can be created with the EncryptionTest createKeySet()
. The ShedLock library enables horizontal scalability, and Spring provides the toolbox that is used. The solution tries to balance the trade-offs for a horizontally scalable data migration that can not be done in a script.
Published at DZone with permission of Sven Loesekann. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments