Compiling Kotlin in Runtime

jsr233: compile Kotlin code dynamically, after application start.

Igor Manushin

Sep. 02, 20 · Tutorial

Likes (3)

Comment

Save

7.9K Views

Everybody knows tasks, which could be easily solved if you can generate and execute code instantly inside the JVM runtime. However, sometimes you have to create a separate library, so code isn't known during its compile time.

Here we observe an approach to generate code and execute it after the application start. We will use jsr233 standard for this.

As task we use popular approach - we create AOP system to parse SQL query response, to parse the result table. So developer is able to add annotations into the code and then our application will generate and execute the parser code in the runtime. However, this approach is much wider. You can create programmable configuration (like TeamCity Build DSL), you can optimize the existing code (the generated code can be constructed after settings reading, so it will be branch-free). And of course we can use this approach to avoid copy-paste in case when the language expressive power isn't enough to extract the generalized block.

All code is available on GitHub, however, you need to install Java 11 (or later) first. Article has simplified version, without logs, diagnostic, tests, etc.

Task

First of all: if you want to solve exact this task, please check the existing libraries. In the most cases there is already developed and supported solution. I can advice Hibernate or Spring Data, which can do the same.

What do we'd like to have: be able to mark some data object with attributes and some "SQL query result to DTO converter" is able to convert rows to the our class instances.

For instance, our client code can be like:

    Kotlin
   
xxxxxxxxxx

data class DbUser(
    @SqlMapping(columnName = "name")
    val name: UserName,
    @SqlMapping(columnName1 = "user_email_name", columnName2 = "user_email_domain")
    val email: Email
)

As you know, to read the database response with Spring JDBC, it is better to use ResultSet interface.

There are at least two methods to extract String from column:

    Kotlin
   
xxxxxxxxxx

String getString(int columnIndex) throws SQLException;
String getString(String columnLabel) throws SQLException;

Let's complicate our task a little:

In case of huge query result, it is better to use index-based approach (e.g. we retrieve indexes for all columns before the first row arrived, remember them and then use these indexes for the each row). This is highly important for high-performance applications, because string-based methods have to compare column names before, so it requires at least NxM unnecessary string equality calls, where N is row count and M is column count.
For another performance boost, we shouldn't use reflection. Therefore, we have to avoid BeanPropertyRowMapper or something like this, because they are reflection-based and too slow.
All property types could be not only primitive, like String, Int. They also can be complex, like self-written NonEmptyText (this class has single String field, which couldn't be null or empty).

As we observed above, it is better to extract our solution to the separate library. Therefore we don't know all types available during the our library compilation. And we'd like to have database response parsing in the way like:

    Kotlin
   
xxxxxxxxxx

fun extractData(rs: ResultSet): List<DbUser> {
      val queryMetadata = rs.metaData
      val queryColumnCount = queryMetadata.columnCount
      val mapperColumnCount = 3
      require(queryColumnCount == mapperColumnCount)
      val columnIndex0 = rs.findColumn("name")
      val columnIndex1 = rs.findColumn("user_email_name")
      val columnIndex2 = rs.findColumn("user_email_domain")
      val result = mutableListOf<DbUser>()
      while (rs.next()) {
          result.add(
              DbUser(
                  name = UserName(rs.getValue(columnIndex0)),
                  email = Email(EmailName(rs.getValue(columnIndex1)), EmailDomain(rs.getValue(columnIndex2)))
              )
          )
      }
      return result
   }
}

One more reminder: please don't use this approach in your project without the internet observation. Moreover if you have task to parse the database rows, even you want to work with JDBC directly (without Hibernate or something) you can achieve this without code generation. And this is our homework - find a way to do this.

Kotlin Script Evaluation

For now there are two the most easy approaches to compile Kotlin in the runtime: you can use Kotlin Compiler directly or you can use jsr233 wrapper. First schema allows you to compile multiple files together, it has better extensibility power, however it is more complex for use. Second approach obviously allows you just to add new type into the current Class Loader. Of course it isn't safe, so please execute only trusted code there (also, Kotlin Script Compiler runs code in the separate restricted Class Loader, however the default security configuration doesn't prevent new process creation or file system access, so please be carefully there too).

First of all, let's define the our interface. We don't want to generate new code for each SQL query, so let's do this once per each object, which will be written from the database. For instance, interface could be:

    Kotlin
   
xxxxxxxxxx

interface ResultSetMapper<TMappingType> : ResultSetExtractor<List<TMappingType>>
interface DynamicResultSetMapperFactory {
    fun <TMappingType : Any> createForType(clazz: KClass<TMappingType>): ResultSetMapper<TMappingType>
}
inline fun <reified TMappingType : Any> DynamicResultSetMapperFactory.createForType(): ResultSetMapper<TMappingType> {
    return createForType(TMappingType::class)
}

inline method is required to have illusion that we have real generics in the JVM. It allows ResultSetMapper construction with code like : return mapperFactory.createMapper<MyClass>().

ResultSetMapper inherits standard Spring interface:

    Java
   
xxxxxxxxxx

@FunctionalInterface
public interface ResultSetExtractor<T> {
    @Nullable
    T extractData(ResultSet rs) throws SQLException, DataAccessException;
}

Factory implementation is responsible to generate the code by using class annotations and then execute it. So we have mockup like:

    Kotlin
   
xxxxxxxxxx

override fun <TMappingType : Any> createForType(clazz: KClass<TMappingType>): ResultSetMapper<TMappingType> {
    val sourceCode = getMapperSourceCode(clazz) // generates code
    return compiler.compile(sourceCode) // compiles code
}

We have to return ResultSetMapper<TMappingType>. And it is better to create class without the generic parameters to get type knowledge for JVM (in this case GraalVM and C2 compilers can use more optimization techniques). Therefore, we compile code like:

    Kotlin
   
xxxxxxxxxx

object : ResultSetMapper<DbUser> { // singletone, which implements the interface
   override fun extractData(rs: java.sql.ResultSet): List<DbUser> {
      /* generated code */
   }
}

For code compilation, we need three steps:

Add all necessary dependencies into the classpath.
Instruct Java about the possible compilers (Kotlin Script compiler is our case).
By using ScriptEngineManager - execute code, which returns object above.

For the first item, let's add the following lines with gradle script:

    Kotlin
   
xxxxxxxxxx

implementation(kotlin("reflect"))
implementation(kotlin("script-runtime"))
implementation(kotlin("compiler-embeddable"))
implementation(kotlin("script-util"))
implementation(kotlin("scripting-compiler-embeddable"))

For the second item, let's add file "src/main/resources/META-INF/services/javax.script.ScriptEngineFactory" into the jar with the following line:

    Plain Text
   
xxxxxxxxxx

org.jetbrains.kotlin.script.jsr223.KotlinJsr223JvmLocalScriptEngineFactory

And then we have the last remaining item - execute script in the runtime:

    Kotlin
   
xxxxxxxxxx

fun <TResult> compile(sourceCode: String): TResult {
    val scriptEngine = ScriptEngineManager()
    val factory = scriptEngine.getEngineByExtension("kts").factory // JVM knows, that kts extension should be compiled with KotlinJsr223JvmLocalScriptEngineFactory
    val engine = factory.scriptEngine as KotlinJsr223JvmLocalScriptEngine
    @Suppress("UNCHECKED_CAST")
    return engine.eval(sourceCode) as TResult
}

Preparing the Model

As I wrote above, let's complicate our task. Let's generate code not only for embedded JVM types, but also for self-written. Therefore, let's dig deeper into the our data model.

Let's imaging that we try to write strongly-typed code, which prevents invalid data as earlier as possible. Therefore:

Instead of field userName: String we have userName: UserName, where class UserName has just one field.
UserName can't be empty, therefore we should check this value in constructor.
We plan to have a lot of such classes, therefore this logic should be extracted to the common block.

As one approach, we can implement the following via this way:

Create class NonEmptyText, which has necessary field and all required checks in the constructor:

    Kotlin
   
xxxxxxxxxx

abstract class NonEmptyText(val value: String) {
    init {
        require(value.isNotBlank()) {
            "Empty text is prohibited for ${this.javaClass.simpleName}. Actual value: $this"
        }
    }
    override fun equals(other: Any?): Boolean {
        if (this === other) return true
        if (javaClass != other?.javaClass) return false
        other as NonEmptyText
        if (value != other.value) return false
        return true
    }
    override fun hashCode(): Int {
        return value.hashCode()
    }
    override fun toString(): String {
        return value
    }
}

Next let's add one more type construction approach:

    Kotlin
   
xxxxxxxxxx

interface NonEmptyTextConstructor<out TResult : NonEmptyText> {
    fun create(value: String): TResult
}

Next we can create UserName class:

    Kotlin
   
xxxxxxxxxx

class UserName(value: String) : NonEmptyText(value) {
    companion object : NonEmptyTextConstructor<UserName> {
        override fun create(value: String) = UserName(value)
    }
}

Here we have UserName, which is strongly typed. And his companion object has ability for it instances construction, so for now we can create instances without direct constructor call:

    Kotlin
   
xxxxxxxxxx

UserName.create("123")

Now we can give this interface to anyone who wants to create instance from input string. For instance, the call for method fun <TValue> createText(input: String?, constructor: NonEmptyTextConstructor<TValue>): TValue?is createText("123", UserName), which is intuitive. It is looks like type classes for JVM.

Let's define Email with the following way:

    Kotlin
   
xxxxxxxxxx

class EmailUser(value: String) : NonEmptyText(value) {
    companion object : NonEmptyTextConstructor<EmailUser> {
        override fun create(value: String) = EmailUser(value)
    }
}
class EmailDomain(value: String) : NonEmptyText(value) {
    companion object : NonEmptyTextConstructor<EmailDomain> {
        override fun create(value: String) = EmailDomain(value)
    }
}
data class Email(val user: EmailUser, val domain: EmailDomain)

We divided it to the two different types here just for complex type example. We don't need this in real life for email. In our case let's do this to test approach "read single object from two columns". Not all ORM implementation can do this, however we can.

Next let's create the DbUser type. It is our DTO, which we read from the database:

    Kotlin
   
xxxxxxxxxx

data class DbUser(val name: UserName, val email: Email)

To generate database result parsing code, we must:

Define column matching. So for name field we have to define one column name.
For email field we need to define two column names.
Define database reading method (moreover - even String type can be read by the different ways).

If we have "one column - one type" matching, then database reading method can be defined with the simple interface:

    Kotlin
   
xxxxxxxxxx

interface SingleValueMapper<out TValue> {
    fun getValue(resultSet: ResultSet, columnIndex: Int): TValue
}

So during the ResultSet reading, we can do the following:

Once remember what index is responsible for what column.
For each line:
1. Call getValue for the each cell.
2. Create object from the previous item results.

As we observed before, let's think that project has a lot of types, which can be marked as "non-empty string". Therefore, we can create common mapper for them:

    Kotlin
   
xxxxxxxxxx

abstract class NonEmptyTextValueMapper<out TResult : NonEmptyText>(
        private val textConstructor: NonEmptyTextConstructor<TResult>
) : SingleValueMapper<TResult> {
    override fun getValue(resultSet: ResultSet, columnIndex: Int): TResult {
        return textConstructor.create(resultSet.getString(columnIndex))
    }
}

We can see, we put the object constructor into this class. Next we can easily create mappers for the exact classes:

    Kotlin
   
xxxxxxxxxx

object UserNameMapper : NonEmptyTextValueMapper<UserName>(UserName) // this object can convert column value to the UserName

Unfortunately, I didn't find the way to express mapper with extension-methods, e.g. to have some kind of extension type. In Scala you can achieve this via implicit. However, this approach isn't explicit.

As we noticed, we have complex type - Email. And it requires two columns. Therefore, interface above isn't applicable for it. As an option, we can create the separate one:

    Kotlin
   
xxxxxxxxxx

interface DoubleValuesMapper<out TValue> {
    fun getValue(resultSet: ResultSet, columnIndex1: Int, columnIndex2: Int): TValue
}

Here we have two input columns with the single result object. This is exactly what we needed, however we should copy-paste these interfaces for the each column count option.

For now we can have combined mapper, which will be like this:

    Kotlin
   
xxxxxxxxxx

abstract class TwoMappersValueMapper<out TResult, TParameter1, TParameter2>(
        private val parameterMapper1: SingleValueMapper<TParameter1>,
        private val parameterMapper2: SingleValueMapper<TParameter2>
) : DoubleValuesMapper<TResult> {
    override fun getValue(resultSet: ResultSet, columnIndex1: Int, columnIndex2: Int): TResult {
        return create(
                parameterMapper1.getValue(resultSet, columnIndex1),
                parameterMapper2.getValue(resultSet, columnIndex2)
        )
    }
    abstract fun create(parameter1: TParameter1, parameter2: TParameter2): TResult
}

And next Email can be read in the following way:

    Kotlin
   
xxxxxxxxxx

object EmailUserMapper : NonEmptyTextValueMapper<EmailUser>(EmailUser)
object EmailDomainMapper : NonEmptyTextValueMapper<EmailDomain>(EmailDomain)
object EmailMapper : TwoMappersValueMapper<Email, EmailUser, EmailDomain>(EmailUserMapper, EmailDomainMapper) {
    override fun create(parameter1: EmailUser, parameter2: EmailDomain): Email {
        return Email(parameter1, parameter2)
    }
}

We we have the last remaining item - define our annotations and write the code generation.

    Kotlin
   
xxxxxxxxxx

@Target(AnnotationTarget.VALUE_PARAMETER)
@MustBeDocumented
annotation class SingleMappingValueAnnotation(
        val constructionClass: KClass<out SingleValueMapper<*>>, // mapper requires just single field ...
        val columnName: String                                   // ... therefore we have one column
)
@Target(AnnotationTarget.VALUE_PARAMETER)
@MustBeDocumented
annotation class DoubleMappingValuesAnnotation(
        val constructionClass: KClass<out DoubleValuesMapper<*>>, // mapper required two fields ...
        val columnName1: String,                                  // ... therefore we have two columns
        val columnName2: String
)

Code Generation From the Annotations

First of all let's define, which code do we like to see. I used the following, which complied with the all initial criteria:

    Kotlin
   
xxxxxxxxxx

object : ResultSetMapper<DbUser> {
   override fun extractData(rs: java.sql.ResultSet): List<DbUser> {
      val queryMetadata = rs.metaData
      val queryColumnCount = queryMetadata.columnCount
      val mapperColumnCount = 3
      require(queryColumnCount == mapperColumnCount) {
          val queryColumns = (0..queryColumnCount).joinToString { queryMetadata.getColumnName(it) }
          "Sql query has invalid columns: $mapperColumnCount is expected, however $queryColumnCount is returned. " +
              "Query has: $queryColumns. Mapper has: name, user_email_name, user_email_domain"
      }
      val columnIndex0 = rs.findColumn("name")
      val columnIndex1 = rs.findColumn("user_email_name")
      val columnIndex2 = rs.findColumn("user_email_domain")
      val result = mutableListOf<DbUser>()
      while (rs.next()) {
          val name = UserNameMapper.getValue(rs, columnIndex0)
          val email = EmailMapper.getValue(rs, columnIndex1, columnIndex2)
          val rowResult = DbUser(
              name = name,
              email = email
          )
          result.add(rowResult)
      }
      return result
   }
}

This code is generated as monolith (variables are defined first and only then used), let's extract several blocks at least with different ideas:

We have N input columns, which are in the different mappers. Therefore we need different variables for them (same columns can be used in the different mappers).
First of all we should verify what we received from the database. If column count is different with expected, then it is better to raise an exception with a lot of details - what we received, should we expected, etc.
SQL cursors works via approach like while(rs.next()) { do }, so let's create mutable list. Ideally we can set his side initially if we know, what the row count is returned from the database.
On the each loop iteration we can to read all field values and then create the resulted object.

Finally, we have the following code:

    Kotlin
   
 

     
    

      

     

      

     

     

    
xxxxxxxxxx

          

         

          

         

            
          
private fun <TMappingType : Any> getMapperSourceCode(clazz: KClass<TMappingType>): String {
    return buildString {
        val className = clazz.qualifiedName!!
        val resultSetClassName = ResultSet::class.java.name
        val singleConstructor = clazz.constructors.single()
        val parameters = singleConstructor.parameters
        val annotations = parameters.flatMap { it.annotations.toList() }
        val columnNames = annotations.flatMap { getColumnNames(it) }.toSet()
        val columnNameToVariable = columnNames.mapIndexed { index, name -> name to "columnIndex$index" }.toMap()
        appendln("""
import com.github.imanushin.ResultSetMapper
object : com.github.imanushin.ResultSetMapper<$className> {
   override fun extractData(rs: $resultSetClassName): List<$className> {
      val queryMetadata = rs.metaData
      val queryColumnCount = queryMetadata.columnCount
      val mapperColumnCount = ${columnNameToVariable.size}
      require(queryColumnCount == mapperColumnCount) {
          val queryColumns = (0..queryColumnCount).joinToString { queryMetadata.getColumnName(it) }
          "Sql query has invalid columns: \${'$'}mapperColumnCount is expected, however \${'$'}queryColumnCount is returned. " +
              "Query has: \${'$'}queryColumns. Mapper has: ${columnNames.joinToString()}"
      }
""")
        columnNameToVariable.forEach { (columnName, variableName) ->
            appendln("      val $variableName = rs.findColumn(\"$columnName\")")
        }
        appendln("""
       val result = mutableListOf<$className>()
       while (rs.next()) {
""")
        parameters.forEach { parameter ->
            fillParameterConstructor(parameter, columnNameToVariable)
        }
        appendln("          val rowResult = $className(")
        appendln(
                parameters.joinToString("," + System.lineSeparator()) { parameter ->
                    "              ${parameter.name} = ${parameter.name}"
                }
        )
        appendln("""
          )
          result.add(rowResult)
      }
      return result
   }
}
""")
        }
    }
private fun StringBuilder.fillParameterConstructor(parameter: KParameter, columnNameToVariable: Map<String, String>) {
    append("              val ${parameter.name} = ")
    // please note: double or missing annotations aren't covered here
    parameter.annotations.forEach { annotation ->
        when (annotation) {
            is DoubleMappingValuesAnnotation ->
                appendln("${annotation.constructionClass.qualifiedName}.getValue(" +
                        "rs, " +
                        "${columnNameToVariable[annotation.columnName1]}, " +
                        "${columnNameToVariable[annotation.columnName2]})"
                )
            is SingleMappingValueAnnotation ->
                appendln("${annotation.constructionClass.qualifiedName}.getValue(" +
                        "rs, " +
                        "${columnNameToVariable[annotation.columnName]})"
                )
        }
    }
}

      

     

Why Do We Need This?

As you can see, it is easy to generate executable code instantly in runtime. I spent just several hours for this small example library (and several more for the article). However here we have workable code which is able to read rows from the database faster than the most of intuitive approaches on the stackoverflow. Moreover, because of fully controlled code generation we can also add object interning, performance measurement at a lot of other improvements and performance optimizations. And the most important point - we know exact the code which will be executed here.

Kotlin DSL can be also used for the programmable configuration. If you love your users, you can stop forcing using them json/xml/yaml files and just give DSL. It will define configuration with type-safe abilities. Just for example, please, take a look on TeamCity Build DSL — you can develop you build, you can write condition/loop to avoid step copying 10 times. You have all code highlights in the IDE. Anyway the final application need the configuration model, there aren't any real restriction on it creation.

Not all ideas can be expresses in the your programming language. And often you don't want to copy-paste code, which isn't so simple to verify. And code generation can help us here too. If you can define your implementation with annotations then let's do it in common way and hide under the interface? This approach is highly useful for JIT compiler, which has code with the all explicit types, instead of generic ones (where it is impossible to do some optimizations, such as stack allocation).

However, the most important point: please estimate first, is it really necessary to play with code generation and runtime code execution. In some projects, reflection-based approach has enough performance, which means that it is better to avoid using non-standard techniques and overcomplicate the project.

Kotlin (programming language) Database sql

Published at DZone with permission of Igor Manushin. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending