Compiling Kotlin in Runtime
jsr233: compile Kotlin code dynamically, after application start.
Join the DZone community and get the full member experience.
Join For FreeEverybody knows tasks, which could be easily solved if you can generate and execute code instantly inside the JVM runtime. However, sometimes you have to create a separate library, so code isn't known during its compile time.
Here we observe an approach to generate code and execute it after the application start. We will use jsr233 standard for this.
As task we use popular approach - we create AOP system to parse SQL query response, to parse the result table. So developer is able to add annotations into the code and then our application will generate and execute the parser code in the runtime. However, this approach is much wider. You can create programmable configuration (like TeamCity Build DSL), you can optimize the existing code (the generated code can be constructed after settings reading, so it will be branch-free). And of course we can use this approach to avoid copy-paste in case when the language expressive power isn't enough to extract the generalized block.
All code is available on GitHub, however, you need to install Java 11 (or later) first. Article has simplified version, without logs, diagnostic, tests, etc.
Task
First of all: if you want to solve exact this task, please check the existing libraries. In the most cases there is already developed and supported solution. I can advice Hibernate or Spring Data, which can do the same.
What do we'd like to have: be able to mark some data object with attributes and some "SQL query result to DTO converter" is able to convert rows to the our class instances.
For instance, our client code can be like:
xxxxxxxxxx
data class DbUser(
@SqlMapping(columnName = "name")
val name: UserName,
@SqlMapping(columnName1 = "user_email_name", columnName2 = "user_email_domain")
val email: Email
)
As you know, to read the database response with Spring JDBC, it is better to use ResultSet interface.
There are at least two methods to extract String
from column:
xxxxxxxxxx
String getString(int columnIndex) throws SQLException;
String getString(String columnLabel) throws SQLException;
Let's complicate our task a little:
- In case of huge query result, it is better to use index-based approach (e.g. we retrieve indexes for all columns before the first row arrived, remember them and then use these indexes for the each row). This is highly important for high-performance applications, because string-based methods have to compare column names before, so it requires at least NxM unnecessary string equality calls, where N is row count and M is column count.
- For another performance boost, we shouldn't use reflection. Therefore, we have to avoid BeanPropertyRowMapper or something like this, because they are reflection-based and too slow.
- All property types could be not only primitive, like
String
,Int
. They also can be complex, like self-writtenNonEmptyText
(this class has singleString
field, which couldn't be null or empty).
As we observed above, it is better to extract our solution to the separate library. Therefore we don't know all types available during the our library compilation. And we'd like to have database response parsing in the way like:
xxxxxxxxxx
fun extractData(rs: ResultSet): List<DbUser> {
val queryMetadata = rs.metaData
val queryColumnCount = queryMetadata.columnCount
val mapperColumnCount = 3
require(queryColumnCount == mapperColumnCount)
val columnIndex0 = rs.findColumn("name")
val columnIndex1 = rs.findColumn("user_email_name")
val columnIndex2 = rs.findColumn("user_email_domain")
val result = mutableListOf<DbUser>()
while (rs.next()) {
result.add(
DbUser(
name = UserName(rs.getValue(columnIndex0)),
email = Email(EmailName(rs.getValue(columnIndex1)), EmailDomain(rs.getValue(columnIndex2)))
)
)
}
return result
}
}
One more reminder: please don't use this approach in your project without the internet observation. Moreover if you have task to parse the database rows, even you want to work with JDBC directly (without Hibernate or something) you can achieve this without code generation. And this is our homework - find a way to do this.
Kotlin Script Evaluation
For now there are two the most easy approaches to compile Kotlin
in the runtime: you can use Kotlin Compiler directly or you can use jsr233 wrapper. First schema allows you to compile multiple files together, it has better extensibility power, however it is more complex for use. Second approach obviously allows you just to add new type into the current Class Loader. Of course it isn't safe, so please execute only trusted code there (also, Kotlin Script Compiler runs code in the separate restricted Class Loader, however the default security configuration doesn't prevent new process creation or file system access, so please be carefully there too).
First of all, let's define the our interface. We don't want to generate new code for each SQL query, so let's do this once per each object, which will be written from the database. For instance, interface could be:
xxxxxxxxxx
interface ResultSetMapper<TMappingType> : ResultSetExtractor<List<TMappingType>>
interface DynamicResultSetMapperFactory {
fun <TMappingType : Any> createForType(clazz: KClass<TMappingType>): ResultSetMapper<TMappingType>
}
inline fun <reified TMappingType : Any> DynamicResultSetMapperFactory.createForType(): ResultSetMapper<TMappingType> {
return createForType(TMappingType::class)
}
inline
method is required to have illusion that we have real generics in the JVM. It allows ResultSetMapper
construction with code like : return mapperFactory.createMapper<MyClass>()
.
ResultSetMapper
inherits standard Spring interface:
xxxxxxxxxx
public interface ResultSetExtractor<T> {
T extractData(ResultSet rs) throws SQLException, DataAccessException;
}
Factory implementation is responsible to generate the code by using class annotations and then execute it. So we have mockup like:
xxxxxxxxxx
override fun <TMappingType : Any> createForType(clazz: KClass<TMappingType>): ResultSetMapper<TMappingType> {
val sourceCode = getMapperSourceCode(clazz) // generates code
return compiler.compile(sourceCode) // compiles code
}
We have to return ResultSetMapper<TMappingType>
. And it is better to create class without the generic parameters to get type knowledge for JVM (in this case GraalVM and C2 compilers can use more optimization techniques). Therefore, we compile code like:
xxxxxxxxxx
object : ResultSetMapper<DbUser> { // singletone, which implements the interface
override fun extractData(rs: java.sql.ResultSet): List<DbUser> {
/* generated code */
}
}
For code compilation, we need three steps:
- Add all necessary dependencies into the classpath.
- Instruct Java about the possible compilers (Kotlin Script compiler is our case).
- By using
ScriptEngineManager
- execute code, which returns object above.
For the first item, let's add the following lines with gradle
script:
xxxxxxxxxx
implementation(kotlin("reflect"))
implementation(kotlin("script-runtime"))
implementation(kotlin("compiler-embeddable"))
implementation(kotlin("script-util"))
implementation(kotlin("scripting-compiler-embeddable"))
For the second item, let's add file "src/main/resources/META-INF/services/javax.script.ScriptEngineFactory" into the jar with the following line:
xxxxxxxxxx
org.jetbrains.kotlin.script.jsr223.KotlinJsr223JvmLocalScriptEngineFactory
And then we have the last remaining item - execute script in the runtime:
xxxxxxxxxx
fun <TResult> compile(sourceCode: String): TResult {
val scriptEngine = ScriptEngineManager()
val factory = scriptEngine.getEngineByExtension("kts").factory // JVM knows, that kts extension should be compiled with KotlinJsr223JvmLocalScriptEngineFactory
val engine = factory.scriptEngine as KotlinJsr223JvmLocalScriptEngine
@Suppress("UNCHECKED_CAST")
return engine.eval(sourceCode) as TResult
}
Preparing the Model
As I wrote above, let's complicate our task. Let's generate code not only for embedded JVM types, but also for self-written. Therefore, let's dig deeper into the our data model.
Let's imaging that we try to write strongly-typed code, which prevents invalid data as earlier as possible. Therefore:
- Instead of field
userName: String
we haveuserName: UserName
, where classUserName
has just one field. UserName
can't be empty, therefore we should check this value in constructor.- We plan to have a lot of such classes, therefore this logic should be extracted to the common block.
As one approach, we can implement the following via this way:
Create class NonEmptyText
, which has necessary field and all required checks in the constructor:
xxxxxxxxxx
abstract class NonEmptyText(val value: String) {
init {
require(value.isNotBlank()) {
"Empty text is prohibited for ${this.javaClass.simpleName}. Actual value: $this"
}
}
override fun equals(other: Any?): Boolean {
if (this === other) return true
if (javaClass != other?.javaClass) return false
other as NonEmptyText
if (value != other.value) return false
return true
}
override fun hashCode(): Int {
return value.hashCode()
}
override fun toString(): String {
return value
}
}
Next let's add one more type construction approach:
xxxxxxxxxx
interface NonEmptyTextConstructor<out TResult : NonEmptyText> {
fun create(value: String): TResult
}
Next we can create UserName
class:
xxxxxxxxxx
class UserName(value: String) : NonEmptyText(value) {
companion object : NonEmptyTextConstructor<UserName> {
override fun create(value: String) = UserName(value)
}
}
Here we have UserName
, which is strongly typed. And his companion object
has ability for it instances construction, so for now we can create instances without direct constructor call:
xxxxxxxxxx
UserName.create("123")
Now we can give this interface to anyone who wants to create instance from input string. For instance, the call for method fun <TValue> createText(input: String?, constructor: NonEmptyTextConstructor<TValue>): TValue?
is createText("123", UserName)
, which is intuitive. It is looks like type classes for JVM.
Let's define Email with the following way:
xxxxxxxxxx
class EmailUser(value: String) : NonEmptyText(value) {
companion object : NonEmptyTextConstructor<EmailUser> {
override fun create(value: String) = EmailUser(value)
}
}
class EmailDomain(value: String) : NonEmptyText(value) {
companion object : NonEmptyTextConstructor<EmailDomain> {
override fun create(value: String) = EmailDomain(value)
}
}
data class Email(val user: EmailUser, val domain: EmailDomain)
We divided it to the two different types here just for complex type example. We don't need this in real life for email. In our case let's do this to test approach "read single object from two columns". Not all ORM implementation can do this, however we can.
Next let's create the DbUser
type. It is our DTO, which we read from the database:
xxxxxxxxxx
data class DbUser(val name: UserName, val email: Email)
To generate database result parsing code, we must:
- Define column matching. So for
name
field we have to define one column name. - For
email
field we need to define two column names. - Define database reading method (moreover - even
String
type can be read by the different ways).
If we have "one column - one type" matching, then database reading method can be defined with the simple interface:
xxxxxxxxxx
interface SingleValueMapper<out TValue> {
fun getValue(resultSet: ResultSet, columnIndex: Int): TValue
}
So during the ResultSet reading, we can do the following:
- Once remember what index is responsible for what column.
- For each line:
- Call
getValue
for the each cell. - Create object from the previous item results.
- Call
As we observed before, let's think that project has a lot of types, which can be marked as "non-empty string". Therefore, we can create common mapper for them:
xxxxxxxxxx
abstract class NonEmptyTextValueMapper<out TResult : NonEmptyText>(
private val textConstructor: NonEmptyTextConstructor<TResult>
) : SingleValueMapper<TResult> {
override fun getValue(resultSet: ResultSet, columnIndex: Int): TResult {
return textConstructor.create(resultSet.getString(columnIndex))
}
}
We can see, we put the object constructor into this class. Next we can easily create mappers for the exact classes:
xxxxxxxxxx
object UserNameMapper : NonEmptyTextValueMapper<UserName>(UserName) // this object can convert column value to the UserName
Unfortunately, I didn't find the way to express mapper
with extension-methods, e.g. to have some kind of extension type. In Scala you can achieve this via implicit. However, this approach isn't explicit.
As we noticed, we have complex type - Email
. And it requires two columns. Therefore, interface above isn't applicable for it. As an option, we can create the separate one:
xxxxxxxxxx
interface DoubleValuesMapper<out TValue> {
fun getValue(resultSet: ResultSet, columnIndex1: Int, columnIndex2: Int): TValue
}
Here we have two input columns with the single result object. This is exactly what we needed, however we should copy-paste these interfaces for the each column count option.
For now we can have combined mapper, which will be like this:
xxxxxxxxxx
abstract class TwoMappersValueMapper<out TResult, TParameter1, TParameter2>(
private val parameterMapper1: SingleValueMapper<TParameter1>,
private val parameterMapper2: SingleValueMapper<TParameter2>
) : DoubleValuesMapper<TResult> {
override fun getValue(resultSet: ResultSet, columnIndex1: Int, columnIndex2: Int): TResult {
return create(
parameterMapper1.getValue(resultSet, columnIndex1),
parameterMapper2.getValue(resultSet, columnIndex2)
)
}
abstract fun create(parameter1: TParameter1, parameter2: TParameter2): TResult
}
And next Email
can be read in the following way:
xxxxxxxxxx
object EmailUserMapper : NonEmptyTextValueMapper<EmailUser>(EmailUser)
object EmailDomainMapper : NonEmptyTextValueMapper<EmailDomain>(EmailDomain)
object EmailMapper : TwoMappersValueMapper<Email, EmailUser, EmailDomain>(EmailUserMapper, EmailDomainMapper) {
override fun create(parameter1: EmailUser, parameter2: EmailDomain): Email {
return Email(parameter1, parameter2)
}
}
We we have the last remaining item - define our annotations and write the code generation.
xxxxxxxxxx
@Target(AnnotationTarget.VALUE_PARAMETER)
@MustBeDocumented
annotation class SingleMappingValueAnnotation(
val constructionClass: KClass<out SingleValueMapper<*>>, // mapper requires just single field ...
val columnName: String // ... therefore we have one column
)
@Target(AnnotationTarget.VALUE_PARAMETER)
@MustBeDocumented
annotation class DoubleMappingValuesAnnotation(
val constructionClass: KClass<out DoubleValuesMapper<*>>, // mapper required two fields ...
val columnName1: String, // ... therefore we have two columns
val columnName2: String
)
Code Generation From the Annotations
First of all let's define, which code do we like to see. I used the following, which complied with the all initial criteria:
xxxxxxxxxx
object : ResultSetMapper<DbUser> {
override fun extractData(rs: java.sql.ResultSet): List<DbUser> {
val queryMetadata = rs.metaData
val queryColumnCount = queryMetadata.columnCount
val mapperColumnCount = 3
require(queryColumnCount == mapperColumnCount) {
val queryColumns = (0..queryColumnCount).joinToString { queryMetadata.getColumnName(it) }
"Sql query has invalid columns: $mapperColumnCount is expected, however $queryColumnCount is returned. " +
"Query has: $queryColumns. Mapper has: name, user_email_name, user_email_domain"
}
val columnIndex0 = rs.findColumn("name")
val columnIndex1 = rs.findColumn("user_email_name")
val columnIndex2 = rs.findColumn("user_email_domain")
val result = mutableListOf<DbUser>()
while (rs.next()) {
val name = UserNameMapper.getValue(rs, columnIndex0)
val email = EmailMapper.getValue(rs, columnIndex1, columnIndex2)
val rowResult = DbUser(
name = name,
email = email
)
result.add(rowResult)
}
return result
}
}
This code is generated as monolith (variables are defined first and only then used), let's extract several blocks at least with different ideas:
- We have N input columns, which are in the different
mapper
s. Therefore we need different variables for them (same columns can be used in the differentmapper
s). - First of all we should verify what we received from the database. If column count is different with expected, then it is better to raise an exception with a lot of details - what we received, should we expected, etc.
- SQL cursors works via approach like
while(rs.next()) { do }
, so let's create mutable list. Ideally we can set his side initially if we know, what the row count is returned from the database. - On the each loop iteration we can to read all field values and then create the resulted object.
Finally, we have the following code:
xxxxxxxxxx
private fun <TMappingType : Any> getMapperSourceCode(clazz: KClass<TMappingType>): String {
return buildString {
val className = clazz.qualifiedName!!
val resultSetClassName = ResultSet::class.java.name
val singleConstructor = clazz.constructors.single()
val parameters = singleConstructor.parameters
val annotations = parameters.flatMap { it.annotations.toList() }
val columnNames = annotations.flatMap { getColumnNames(it) }.toSet()
val columnNameToVariable = columnNames.mapIndexed { index, name -> name to "columnIndex$index" }.toMap()
appendln("""
import com.github.imanushin.ResultSetMapper
object : com.github.imanushin.ResultSetMapper<$className> {
override fun extractData(rs: $resultSetClassName): List<$className> {
val queryMetadata = rs.metaData
val queryColumnCount = queryMetadata.columnCount
val mapperColumnCount = ${columnNameToVariable.size}
require(queryColumnCount == mapperColumnCount) {
val queryColumns = (0..queryColumnCount).joinToString { queryMetadata.getColumnName(it) }
"Sql query has invalid columns: \${'$'}mapperColumnCount is expected, however \${'$'}queryColumnCount is returned. " +
"Query has: \${'$'}queryColumns. Mapper has: ${columnNames.joinToString()}"
}
""")
columnNameToVariable.forEach { (columnName, variableName) ->
appendln(" val $variableName = rs.findColumn(\"$columnName\")")
}
appendln("""
val result = mutableListOf<$className>()
while (rs.next()) {
""")
parameters.forEach { parameter ->
fillParameterConstructor(parameter, columnNameToVariable)
}
appendln(" val rowResult = $className(")
appendln(
parameters.joinToString("," + System.lineSeparator()) { parameter ->
" ${parameter.name} = ${parameter.name}"
}
)
appendln("""
)
result.add(rowResult)
}
return result
}
}
""")
}
}
private fun StringBuilder.fillParameterConstructor(parameter: KParameter, columnNameToVariable: Map<String, String>) {
append(" val ${parameter.name} = ")
// please note: double or missing annotations aren't covered here
parameter.annotations.forEach { annotation ->
when (annotation) {
is DoubleMappingValuesAnnotation ->
appendln("${annotation.constructionClass.qualifiedName}.getValue(" +
"rs, " +
"${columnNameToVariable[annotation.columnName1]}, " +
"${columnNameToVariable[annotation.columnName2]})"
)
is SingleMappingValueAnnotation ->
appendln("${annotation.constructionClass.qualifiedName}.getValue(" +
"rs, " +
"${columnNameToVariable[annotation.columnName]})"
)
}
}
}
Why Do We Need This?
As you can see, it is easy to generate executable code instantly in runtime. I spent just several hours for this small example library (and several more for the article). However here we have workable code which is able to read rows from the database faster than the most of intuitive approaches on the stackoverflow. Moreover, because of fully controlled code generation we can also add object interning, performance measurement at a lot of other improvements and performance optimizations. And the most important point - we know exact the code which will be executed here.
Kotlin DSL can be also used for the programmable configuration. If you love your users, you can stop forcing using them json/xml/yaml files and just give DSL. It will define configuration with type-safe abilities. Just for example, please, take a look on TeamCity Build DSL — you can develop you build, you can write condition/loop to avoid step copying 10 times. You have all code highlights in the IDE. Anyway the final application need the configuration model, there aren't any real restriction on it creation.
Not all ideas can be expresses in the your programming language. And often you don't want to copy-paste code, which isn't so simple to verify. And code generation can help us here too. If you can define your implementation with annotations then let's do it in common way and hide under the interface? This approach is highly useful for JIT compiler, which has code with the all explicit types, instead of generic ones (where it is impossible to do some optimizations, such as stack allocation).
However, the most important point: please estimate first, is it really necessary to play with code generation and runtime code execution. In some projects, reflection-based approach has enough performance, which means that it is better to avoid using non-standard techniques and overcomplicate the project.
Published at DZone with permission of Igor Manushin. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments