The Need for Speed: Access Existing Data 1,000x Faster
Learn how you can speed up your analytics database applications by a factor of 1,000 by using standard Java 8 streams and Speedment's In-JVM-Memory accelerator.
Join the DZone community and get the full member experience.
Join For FreeWeb and mobile applications are sometimes slow because the backing database is slow and/or the connection to the database imposes latencies. Modern UIs and interactive applications require fast back-ends with ideally no observable latency or else users will move on to other services or will just get tired and stop using the service altogether.
In this article, we will learn how analytic database applications can be sped up by orders of magnitude using standard Java 8 streams and Speedment's in-JVM-memory acceleration technology. At the end, we will run a JMH test suit with representative benchmarks that indicate a speedup factor exceeding 1,000 times. Learn more about Speedment's open-source solution here.
Viewing the Database as Streams
Speedment is a modern stream-based ORM meaning that the tables are viewed as standard Java 8 streams. In this article, we will use the "Sakila" database that is an open-source example database available directly from Oracle here. The Sakila example database contains films, actors, etc. This is how a Java 8 stream from the database might look:
List<Film> secondPage = films.stream()
.filter(Film.RATING.equal("PG-13"))
.sorted(Film.TITLE.comparator())
.skip(50)
.limit(50)
.collect(Collectors.toList());
This stream will filter out only those films that have a rating equal to "PG-13" and will then sort the remaining films by film title. After that, the first 50 films are skipped and then the next 50 films are collected to a list. Thus, we get the second page of all PG-13 films sorted in title order. Usually, we also need to know how many films there are in total that has a rating of "PG-13" in order to show a correctly scaled scrollbar in our application. This can be done like this:
long count = films.stream()
.filter(Film.RATING.equal("PG-13"))
.count();
Using a Database
Speedment will automatically render the Streams to SQL. This way, we can remain in a pure typesafe Java environment without having to write SQL code. By enabling logging, we can see that the first paging stream will be rendered to the following SQL query (assuming we are using MySQL):
SELECT
`film_id`,`title`,`description``release_year`,
`language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
`length`,`replacement_cost`,`rating`,`special_features`,`last_update`
FROM
`sakila`.`film`
WHERE
(`sakila`.`film`.`rating` = ? COLLATE utf8_bin)
ORDER BY
`sakila`.`film`.`title` ASC
LIMIT ? OFFSET ?
values:[PG-13, 50, 50]
The second count stream will be rendered to:
SELECT COUNT(*) FROM (
SELECT
`film_id`,`title`,`description``release_year`,
`language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
`length`,`replacement_cost`,`rating`,`special_features`,`last_update`
FROM
`sakila`.`film`
WHERE
(`sakila`.`film`.`rating` = ? COLLATE utf8_bin)
) AS A
values:[PG-13]
Thus, the stream operations are rendered to efficient SQL. When running one thousand of these queries in parallel on a laptop class of computer with the MySQL standard server configuration, they complete in 700 ms and 175 ms aggregated latency respectively. If you are thinking about how the second SQL statement could be efficient, the fact is that the database will be able to basically eliminate the inner select.
Using In-JVM-Memory Acceleration
Now to the fun part. Let's activate the in-JVM-memory acceleration component in Speedment, called DataStore, in our application. This is done like so:
SakilaApplication app = new SakilaApplicationBuilder()
.withPassword("sakila-password")
// Activate DataStore
.withBundle(DataStoreBundle.class)
.build();
// Load a snapshot of the database into off heap memory
app.get(DataStoreComponent.class)
.ifPresent(DataStoreComponent::load);
When the application is started, a snapshot of the database is pulled into the JVM and is stored off-heap. Because data is stored off-heap, data will not influence garbage collection and the amount of data is only limited by the available RAM. Nothing prevents us from loading terabytes of data if we have that amount of RAM available.
If we now run the same application again, we get 22 ms and 1 ms aggregated latency. This means that the latency is reduced by a factor of 30 and 170 respectively. A significant improvement, it has to be said. But it is getting better still!
Using In-JVM-Memory Acceleration and JSON
REST and JSON are commonly used to serve clients that request data these days. Speedment has a special collector that can collect JSON data using something called in-place deserialization whereby only the fields that are needed by the collector are deserialized from off-heap memory. We can depend on the JSON plugin by first adding a dependency in our pom file:
<dependency>
<groupId>com.speedment.enterprise.plugins</groupId>
<artifactId>json-stream</artifactId>
<version>${speedment.enterprise.version}</version>
</dependency>
Then, we install the plugin in the ApplicationBuilder
as shown here:
SakilaApplication app = new SakilaApplicationBuilder()
.withPassword("sakila-password")
.withBundle(DataStoreBundle.class)
// Install the Json Plugin
.withBundle(JsonBundle.class)
.build();
If we only want the Film fields "title," "rating," and "length" in the JSON output, we can create a JSON encoder like this:
final JsonComponent json = app.getOrThrow(JsonComponent.class);
final JsonEncoder<Film> filmEncoder = json.<Film>emptyEncoder()
.put(Film.TITLE)
.put(Film.RATING)
.put(Film.LENGTH)
.build();
This decoder is immutable and can be reused over and over again in our application:
String json = films.stream()
.filter(Film.RATING.equal("PG-13"))
.sorted(Film.TITLE.comparator())
.skip(50 * pageNo)
.limit(50)
.collect(JsonCollectors.toList(filmEncoder));
This gives us an additional speed- up factor of two compared to dealing with entire entities. The JsonComponent
can do much more than simply collecting things to a list. For example, it can create aggregations, also using in-place de-serialization.
Run Your Own Projects With In-JVM-Memory Acceleration
It is easy to try in-JVM-Memory acceleration by yourself. There is a free Initializer that can be found here. Just stick in your desired database type and you will get a POM and an application template automatically generated for you. You also need a license key to run. Just click Request a Free Trial License Key on the same page to get one. If you need more help setting up your project, check out the Speedment GitHub page or explore the manual.
How Fast Is It for Real?
Speedment supports a number of database types including Oracle, MySQL, MariaDB, PostgreSQL, Microsoft SQL Server, DB2 and AS400. Speedment can also work with Avro files that is used by Hadoop. In this example, we will run MySQL.
Testing performance in Java applications is notoriously difficult. Using the JMH framework, I have written a number of typical applications and have run each test hundreds of thousand times and compared the result for pure MySQL and MySQL with Speedment's in-JVM accelerator. The performance figures below are given as operations/second (higher is better).
Benchmark |
Pure MySQL |
MySQL with Speedment in-JVM |
Speed-up factor |
Count all |
5,324 |
43,615,967 |
8,000 |
Count with filter |
5,107 |
2,465,928 |
400 |
Filtering |
449 |
597,702 |
1,300 |
Sorting |
109 |
171,304 |
1,500 |
Paging |
1,547 |
1,443,015 |
900 |
Iterate all |
108 |
5,556 |
50 |
Aggregation |
117 |
167,728 |
1,400 |
Aggregation filter |
453 |
608,763 |
1,300 |
As can be seen, MySQL with the Speedment In-JVM accelerator outperforms pure MySQL by a factor of 1,000 or more in most cases. The smallest speed-up factor observed was 50 times, which is still very good.
Benchmark Code
Below are some examples of how the benchmark code looks. The complete benchmark application can be found on GitHub here. I encourage you to clone it and run it to see the speedup factor on your own target machine.
@Benchmark
public String paging() {
return films.stream()
.filter(Film.RATING.equal("PG-13"))
.skip(50)
.limit(50)
.collect(filmCollector);
}
@Benchmark
public String aggregationWithFilter() {
return films.stream()
.filter(Film.RATING.equal("PG-13"))
.collect(sumLengthCollector);
}
How Much RAM is Needed?
Speedment can often store data in RAM more efficiently than the database itself. The Sakila database in the benchmark takes 6.6 MB on disk but Speedment is only using 3 MB of memory. Considering that Speedment indexes all columns by default whereas the database only indexes a few columns, the Speedment is remarkably memory efficient.
How Long Time Does it Take to Load Data?
The Sakila database was loaded and indexed by Speedment in less than one second. Speedment can refresh data from the database in the background and will keep track of which streams are running against which database snapshot version (MVCC).
How Much Faster Will My Own Applications Run?
How much latency reduction one would get in a particular project is anyone's guess. Is is x10, x50, x100 or even more? Take the opportunity and find out how much speed you can gain in your own projects!
Take it for a Spin
Learn more about Speedment on GitHub and kick-start your own project using the Speedment Initializer and remember to click Enable in-memory acceleration" and use the initializer to get a free evaluation license key, too. Explore the manual section for the Speedment in-JVM accelerator here.
Published at DZone with permission of Per-Åke Minborg, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments