Optimizing JPA Performance: An EclipseLink, Hibernate, and OpenJPA Comparison
Join the DZone community and get the full member experience.
Join For Free'Impedance mismatch'. No two words encompass the troubles, headaches and quirks most developers face when attempting to link applications to relational databases (RDBMS). But lets face it, object orientated designs aren't going away anytime soon from mainstream languages and neither are the relational storage systems used in most applications. One side works with objects, while the other with tables. Resolving these differences -- or as its technically referred to 'object/relational impedance mismatch' -- can result in substantial overhead, which in turn can materialize into poor application performance.
In Java, the Java Persistence API (JPA) is one of the most popular mechanisms used to bridge the gap between objects (i.e. the Java language) and tables (i.e. relational databases). Though there are other mechanisms that allow Java applications to interact with relational databases -- such as JDBC and JDO -- JPA has gained wider adoption due to its underpinnings: Object Relational Mapping (ORM). ORM's gain in popularity is due precisely to it being specifically designed to address the interaction between object and tables.
In the case of JPA, there is a standard body charged with setting its course, a process which has given way to several JPA implementations, among the three most popular you will find: EclipseLink (evolved from TopLink), Hibernate and OpenJPA. But even though all three are based on the same standard, ORM being such a deep and complex topic, beyond core functionality each implementation has differences ranging from configuration to optimization techniques.
What I will do next is explain a series of topics related to optimizing an application's use of the JPA, using and comparing each of the previous JPA implementations. While JPA is capable of automatically creating relational tables and can work with a series of relational database vendors, I will part from having pre-existing data deployed on a MySQL relational database, in addition to relying on the Spring framework to facilitate the use of the JPA. This will not only make it a fairer comparison, but also make the described techniques appealing to a wider audience, since performance issues become a serious concern once you have a large volume of data, in addition to MySQL and Spring being a common choice due to their community driven (i.e. open-source) roots. See the source code/application section at the end for instructions on setting up the application code discussed in the remainder of the sections.
Download the Source Code associated with this article (~45 MB)
The basics: Metrics
In order to establish JPA performance levels in an application, it's vital to first obtain a series of metrics related to a JPA implementation's inner workings. These include things like:
- What are the actual queries being performed against a RDBMS?
- How long does each query take?
- Are queries being performed constantly against the RDBMS or is a cache being used?
These metrics will be critical to our performance analysis, since they will shed light on the underlying operations performed by a JPA implementation and in the process show the effectiveness or ineffectiveness of certain techniques.
In this area you will find the first differences among implementations, and I'm not talking about metric results, but actually how to obtain these metrics. To kick things off, I will first address the topic of logging. By default, all three JPA implementations discussed here -- EclipseLink, Hibernate and OpenJPA -- log the query performed against a RDBMS, which will be an advantage in determining if the queries performed by an ORM are optimal for a particular relational data model. Nevertheless, tweaking the logging level of a JPA implementation further can be helpful for one of two things: Getting even more details from the underlying operations made by a JPA -- which can be turned off by default (e.g. database connection details) -- or getting no logging information at all -- which can benefit a production system's performance.
Logging in JPA implementations is managed through one of several logging frameworks, such as Apache Commons Logging or Log4J. This requires the presence of such libraries in an application. Logging configuration of a JPA implementation is mostly done through a <property> value in an application's persistence.xml file or in some cases, directly in a logging framework's configuration files. The following table describes JPA logging configuration parameters:
Large table, so here's an external link
In addition to the information obtained through logging, there is another set of JPA performance metrics which require different steps to be obtained. One of these metrics is the time it takes to perform a query. Even though some JPA implementations provide this information using certain configurations, some do not. Even so, I opted to use a separate approach and apply it to all three JPA implementations in question. After all, time metrics measured in milliseconds can be skewed in certain ways depending on start and end time criteria. So to measure query times, I will use Aspects with the aid of the Spring framework.
Aspects will allow us to measure the time it takes a method containing a query to be executed, without mixing the timing logic with the actual query logic -- the last feature of which is the whole purpose of using Aspects. Further discussing Aspects would go beyond the scope of performance, so next I will concentrate on the Aspect itself. I advise you to look over the accompanying source code, Aspects and Spring Aspects for more details on these topics and their configuration. The following Aspect is used for measuring execution times in query methods.
package com.webforefront.aop;import org.apache.commons.lang.time.StopWatch;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Pointcut;import org.aspectj.lang.annotation.Aspect;@Aspectpublic class DAOInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object logQueryTimes(ProceedingJoinPoint pjp) throws Throwable { StopWatch stopWatch = new StopWatch(); stopWatch.start(); Object retVal = pjp.proceed(); stopWatch.stop(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": " + stopWatch.getTime() + "ms"); return retVal; }}
The main part of the Aspect is the @Around annotation. The value assigned to this last annotation indicates to execute the aspect method -- logQueryTimes -- each time a method belonging to a class in the com.webforefront.jpa.service package is executed -- this last package is where all our application's JPA query methods will reside. The logic performed by the logQueryTimes aspect method is tasked with calculating the execution time and outputting it as logging information using Apache Commons Logging.
Another set of important JPA metrics is related to statistics beyond those provided by standard logging. The statistics I'm referring to are things related to caches, sessions and transactions. Since the JPA standard doesn't dictate any particular approach to statistics, each JPA implementation also varies in the type and way it collects statistics. Both Hibernate and OpenJPA have their own statistics class, where as EclipseLink relies on a Profiler to gather similar metrics.
Since I'm already relying on Aspects, I will also use an Aspect to obtain statistics both prior and after the execution of a JPA query method. The following Aspect obtains statistics for an application relying on Hibernate.
package com.webforefront.aop;import org.hibernate.stat.Statistics;import org.hibernate.SessionFactory;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.hibernate.ejb.HibernateEntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheHibernateInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { HibernateEntityManagerFactory hbmanagerfactory = (HibernateEntityManagerFactory) entityManagerFactory; SessionFactory sessionFactory = hbmanagerfactory.getSessionFactory(); Statistics statistics = sessionFactory.getStatistics(); String str = pjp.getTarget().toString(); statistics.setStatisticsEnabled(true); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) " + statistics); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) " + statistics); return result; } }
Notice the similar structure to the prior timing Aspect, except in this case the logging output contains values that belong to the Statistics Hibernate class obtained via the application's EntityManagerFactory. The next Aspect is used to obtain statistics for an application relying on OpenJPA.
package com.webforefront.aop;import org.apache.openjpa.datacache.CacheStatistics;import org.apache.openjpa.persistence.OpenJPAEntityManagerFactory;import org.apache.openjpa.persistence.OpenJPAPersistence;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.springframework.beans.factory.annotation.Autowired;import javax.persistence.EntityManagerFactory;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;@Aspectpublic class CacheOpenJPAInterceptor { private Log log = LogFactory.getLog(DAOInterceptor.class); @Autowired private EntityManagerFactory entityManagerFactory; @Around("execution(* com.webforefront.jpa.service..*.*(..))") public Object log(ProceedingJoinPoint pjp) throws Throwable { OpenJPAEntityManagerFactory ojpamanagerfactory = OpenJPAPersistence.cast(entityManagerFactory); CacheStatistics statistics = ojpamanagerfactory.getStoreCache().getStatistics(); String str = pjp.getTarget().toString(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); Object result = pjp.proceed(); log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount()); return result; } }
Once again, notice the similar Aspect structure to the previous Aspect which relies on an application's EntityManagerFactory. In this case, the logging output contains values that belong to the CacheStatistics OpenJPA class. Since OpenJPA does not enable statistics by default, you will need to add the following two properties to an application's persistence.xml file:
<property name="openjpa.DataCache" value="true(EnableStatistics=true)"/><property name="openjpa.RemoteCommitProvider" value="sjvm"/>
The first property ensures statistics are gathered, while the second property is used to indicate the gathering of statistics take place on a single JVM. NOTE: The value "true(EnableStatistics=true)" also enables caching in addition to statistics.
Since EclipseLink doesn't have any particular statistics class and relies on a Profiler to determine advanced metrics, the simplest way to obtain similar statistics to those of Hibernate and OpenJPA is through the Profiler itself. To active EclipseLink's Profiler you just need to add the following property to an application's persistence.xml file: <property name="eclipselink.profiler" value="PerformanceProfiler"/>. By doing so, the EclipseLink Profiler output's several metrics on each JPA query method execution as logging information.
Now that you know how to obtain several metrics from all three JPA implementations and understand they will be obtained as fairly as possible for all three providers, it's time to put each JPA implementation to the test along with several performance techniques.
JPQL queries, weaving and class transformations
Lets start by making a query that retrieves data belonging to a pre-existing RDBMS table named "Master". The "Master" table contains over 17,000 records belonging to baseball players. To simplify matters, I will create a Java class named "Player" and map it to the "Master" table in order to retrieve the records as objects. Next, relying on the Spring framework's JpaTemplate functionality, I will setup a query to retrieve all "Player" objects, with the query taking the following form:
getJpaTemplate().find("select e from Player e");
See the accompanying source code for more details on this last process.
Next, I deploy the application using each of the three JPA implementations on Apache Tomcat, doing so separately, as well as starting and stopping the server on each deployment to ensure fair results. These are the results of doing so on a 64-bit Ubuntu-4GB RAM box, using Java 1.6:
All player objects - 17,468 records | Time | Query |
Hibernate | 3558 ms | select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ |
EclipseLink (Run-time weaver - Spring ReflectiveLoadTimeWeaver weaver ) | 3215 ms | SELECT lahmanID, nameLast, nameFirst FROM Master |
EclipseLink (Build-time weaving) | 3571 ms | SELECT lahmanID, nameLast, nameFirst FROM Master |
EclipseLink (No weaving) | 3996 ms | SELECT lahmanID, nameLast, nameFirst FROM Master |
OpenJPA (Build-time enhanced classes) | 5998 ms | SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 |
OpenJPA (Run-time enhanced classes- OpenJPA enhancer) | 6136 ms | SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 |
OpenJPA (Non enhanced classes) | 7677 ms | SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 |
As you can observe, the queries performed by each JPA implementation are fairly similar, with two of them using a shortcut notation (e.g. t0 and player0 for the table named 'Master'). This syntax variation though has minimal impact on performance, since <i>directly</i> querying an RDBMS using any of these notation variations shows identical results. However, the query times made through several JPA implementations using distinct parameters vary considerably. One important factor leading to this time difference is due to how each implementation handles JPA entities.
Lets start with the OpenJPA implementation which had the poorest times. OpenJPA can execute an enhancement process on Java entities (e.g. in this case the 'Player' class). This enhancement process can be performed when the entities are built, at run-time or foregone altogether. As you can observe, foregoing entity enhancement altogether in OpenJPA produced the longest query times. Where as enhancing entities at either build-time or run-time produced relatively better results, with the former beating out the latter. By default, OpenJPA expects entities to be enhanced. This means you will either need to explicitly configure an application to support unenhanced classes by adding the following:
<property name="openjpa.RuntimeUnenhancedClasses" value="supported"/>
...property to an application's persistence.xml file or enhance classes at build-time or at run-time relying on the OpenJPA enhancer, otherwise an application relying on OpenJPA will throw an error.
Given these OpenJPA results, the remaining OpenJPA tests will be based on build-time enhanced entity classes. For more on the topic of OpenJPA enhancement, refer to the OpenJPA documentation in addition to consulting the accompanying source code for this article.
You may be wondering what exactly constitutes OpenJPA enhancement ? OpenJPA entity enhancement is a processing step applied to the bytecode generated by the Java compiler which adds JPA specific instructions to provide optimal runtime performance, these instructions can include things like flexible lazy loading and dirty read tracking. So why doesn't Hibernate or EclipseLink enhance entities ? In short, Hibernate and EclipseLink also enhance JPA entites, they just don't outright call it 'enhancement'.
EclipseLink calls this 'enhancement' process by the more technical term: weaving. Similar to OpenJPA's enhancement process, weaving in EclipseLink can take place at either build-time (a.k.a. static weaving), run-time or forgone altogether. As you can observe in the results, all of EclipseLink's tests present smaller variations compared to OpenJPA. The longest EclipseLink variation involved not using weaving. If you think about it, this is rather logical given that the purpose of weaving consists of altering Java byte code for the purpose of adding optimized JPA instructions that include lazy loading, change tracking, fetch groups and internal optimizations.
For the EclipseLink tests using weaving, both build-time and run-time weaving present better results. For build-time weaving, I used EclipseLink's library along with an Apache Ant task, where as for run-time weaving, I used the Spring framework's ReflectiveLoadTimeWeaver. I can only assume the slightly better performance of using run-time weaving over build-time weaving in EclipseLink was due to the fact of using a weaver integrated with the Spring framework, which in turn could result in better JPA optimizations designed for Spring applications. Nevertheless, considering the test result of forgoing weaving altogether, weaving does not appear to be a major performance impact when using EclipseLink, ceteris paribus.
By default, EclipseLink expects run-time weaving to be enabled, otherwise you will receive an error in the form 'Cannot apply class transformer without LoadTimeWeaver specified'. This means that for cases using build-time weaving or no weaving at all, you will need to explicitly indicate this behavior. In order to disable EclipseLink weaving you will need to either configure an application's EntityManagerFactory Spring bean with:
<property name="jpaPropertyMap"><map><entry key="eclipselink.weaving" value="false"/></map></property>
... or add the ....
<property name="eclipselink.weaving" value="false"/>
...property to an application's persistence.xml file. To indicate an application's entities are built using build-time weaving, substitute the previous property's "false" value with "static". To configure the default run-time weaver expected by EclipseLink, add the following:
<property name="loadTimeWeaver"><bean class="org.springframework.instrument.classloading.ReflectiveLoadTimeWeaver"/></property>
...property to an application's EntityManagerFactory Spring bean.
Given these EclipseLink results, the remaining EclipseLink tests will be based on run-time weaving provided by the Spring framework. For more on the topic of EclipseLink weaving, refer to the EclipseLink documentation at http://wiki.eclipse.org/Introduction_to_EclipseLink_Application_Development_(ELUG)#Using_Weaving, in addition to consulting the accompanying source code for this article.
Hibernate doesn't require neither enhancing JPA entities or weaving. For this reason, there is only one test result. This not only makes Hibernate simpler to setup, but judging by its only test result -- which clock's in at second place with respect to all other tests -- Hibernate's performance ranks high compared to its counterparts. However, in what I would consider Hibernate's equivalent to OpenJPA's enhancement process or EclipseLink's weaving, you will find a series of Hibernate properties. For example, Hibernate has properties such as
hibernate.default_batch_fetch_size designed to optimize lazy loading. As you might recall, among the purposes of both OpenJPA's enhancement process and EclipseLink's weaving are the optimization of lazy loading. So where as OpenJPA and EclipseLink require a separate and monolithic step -- at build-time or run-time -- to achieve JPA optimization techniques, Hibernate falls back to the use of granular properties specified in an application's persistence.xml file. Nevertheless, given that Hibernate's default behavior proved to be on par with the best query times, I didn't feel a need to further explore with these Hibernate properties.
To get another sense of the times and mapping procedures of each JPA implementation, I will make more selective queries based on a Player object's first name and last name. These are the results of performing a query for all Player objects whose first name is John and a query for all Player objects whose last name in Smith.
All player objects whose first name is John - 472 records | Time | Query |
EclipseLink | 1265 ms | SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameFirst = ?) |
Hibernate | 613 ms | select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameFirst=? |
OpenJPA | 1643 ms | SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameFirst = ?) [params=?] |
All player objects whose last name is Smith - 146 records | Time | Query |
EclipseLink | 986 ms | SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameLastt = ?) |
Hibernate | 537 ms | select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameLast=? |
OpenJPA | 1452 ms | SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameLast = ?) [params=?] |
These test results tell a slightly different story,with all three JPA implementations presenting substantial time differences amongst one another. At a lower record count, Hibernate's out-of-the-box configuration resulted in almost twice as fast queries as its closest competitor and almost three times faster queries than its other competitor.
To get an even broader sense of the times and mapping procedures of each JPA implementation, I will make a query on a single Player object based on its id. These are the results of performing such a query.
Single player object whose ID is 777- 1 record | Time | Query |
EclipseLink | 521 ms | SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (lahmanID = ?) |
Hibernate | 157 ms | select player0_.lahmanID as lahmanID0_0_, player0_.nameFirst as nameFirst0_0_, player0_.nameLast as nameLast0_0_ from Master player0_ where player0_.lahmanID=? |
OpenJPA | 1052 ms | SELECT t0.nameFirst, t0.nameLast FROM Master t0 WHERE t0.lahmanID = ? [params=?] |
With the exception of the faster query times -- due to it being a query for a single Player object -- the times between JPA implementations are practically in proportion to the queries used for extracting multiple Player objects by first and last name.
This will do it as far as test queries are concerned. However, a word of caution is in order when discussing these topics on optimization/enhancement/weaving. Even though the previous tests consisted of querying over 17,000 records and confirm clear advantages of using one provider and technique over another, they are still one dimensional, since they're based on read operations performed on a single object type and a single RDBMS table. JPA can perform a large array of operations that also include updating, writing and deleting RDBMS records, not to mention the execution of more elaborate queries that can span multiple objects and tables. In addition, RDBMS themselves can have influencing factors (e.g. indexes) over JPA query times. So all this said, it's not too far fetched to think the use of OpenJPA entity enhancement, EclipseLink weaving or Hibernate properties, could have varying degrees -- either beneficial or detrimental -- depending on the queries (i.e. multi-table, multi-object) and type of JPA operation (i.e. read, write, update, delete) involved.
Next, I will describe one of the most popular techniques used to boost performance in JPA applications.
Caches
A cache allows data to remain closer to an application's tier without constantly polling an RDBMS for the same data. I entitled the section in plural -- caches -- because there can be several caches involved in an application using JPA. This of course doesn't mean you have to configure or use all the caches provided by an application relying on JPA, but properly configuring caches can go a long way toward enhancing an application's JPA performance.
So lets start by analyzing what it's each JPA implementation offers in its out-of-the-box state in terms of caching. The following table illustrates tests done by simply invoking the previous JPA queries for a second and third consecutive time, without stopping the server. Note that the same process of deploying a single application at once was used, in addition to the server being re-started on each set of tests.
Query / Implementation | EclipseLink | Hibernate | OpenJPA |
All records (1st time) | 3215 ms | 3558 ms | 5998 ms |
All records (2nd time) | 507 ms | 272 ms | 521 ms |
All records (3rd time) | 439 ms | 218 ms | 263 ms |
First name (1st time) | 1265 ms | 613 ms | 1643 ms |
First name (2nd time) | 151 ms | 115 ms | 239 ms |
First name (3rd time) | 154 ms | 101 ms | 227 ms |
Last name (1st time) | 986 ms | 537 ms | 1452 ms |
Last name (2nd time) | 41 ms | 41 ms | 112 ms |
Last name (3rd time) | 65 ms | 38 ms | 117 ms |
By ID (1st time) | 521 ms | 157 ms | 1052 ms |
By ID (2nd time) | 1 ms | 6 ms | 3 ms |
By ID (3rd time) | 1 ms | 3 ms | 3 ms |
As you can observe, on both the second and third invocation all the queries show substantial improvements with respect to the first invocation. The primary cause for these improvements is unequivocally due to the use of a cache. But what type of cache exactly ? Could it be an RDBMS's own caching engine ? JPA ? Spring ? Or some other variation ?. In order to shed some light on cache usage, the following table illustrates the cache statistics generated on each of the previous JPA queries.
Query / Impleme)ntation | EclipseLink | Hibernate | OpenJPA |
All records (2nd time) | number of objects=17468, total time=506, local time=506, row fetch=65, object building=328, cache=112, sql execute=47, objects/second=34521, | sessions opened=2, | N/A |
All records (3rd time) | number of objects=17468, total time=435, local time=435, profiling time=1, row fetch=28, object building=323, cache=106, logging=1, sql execute=27, objects/second=40156, | sessions opened=3, | N/A |
First name (2nd time) | number of objects=472, total time=148, local time=148, row fetch=27, object building=106, cache=7, logging=1, sql execute=3, objects/second=3189, | sessions opened=2, | N/A |
First name (3rd time) | number of objects=472, total time=152, local time=152, row fetch=20, object building=121, cache=7, sql execute=3, objects/second=3105, | sessions opened=3, | N/A |
Last name (2nd time) | number of objects=146, total time=40, local time=40, row fetch=7, object building=27, cache=2, logging=1, sql execute=3, objects/second=3650, | sessions opened=2, | N/A |
Last name (3rd time) | number of objects=146, total time=63, local time=63, profiling time=1, row fetch=6, object building=19, cache=5, sql prepare=1, sql execute=23, objects/second=2317, | sessions opened=3, | N/A |
By ID (2nd time) | number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, | sessions opened=2, | N/A |
By ID (3rd time) | number of objects=1, total time=1, local time=1, time/object=1, objects/second=1000, | sessions opened=3, | N/A |
Notice the statistics generated by each JPA implementation are different. EclipseLink reports a single cache statistic, OpenJPA doesn't even report statistics unless a cache is enabled -- see previous section on metrics for details on this behavior -- and Hibernate reports two cache related statistics: second level cache and query cache.
At this juncture, if you look at the test results and statistics for the second and third invocation, something won't add up. How is it that OpenJPA's test results came out faster when caching is disabled by default ? An how about Hibernate returning 0's on its cache related statistics, even when its test results came out faster ? The reason for this performance increase is due to RDBMS caching. On the first query, the RDBMS needs to read data from its own file system (i.e. perform an I/O operation), on subsequent requests the data is present in RDBMS memory (i.e. its cache) making the entire JPA query much faster. A closer look at the Hibernate statistics field 'queries executed to the database' can confirm this. Notice that on every second query it shows 2 and on every third query it shows 3, meaning the data was read directly from the database. NOTE: The only exception to this occurs when a query is made on a single entity (i.e. by id), I will address this shortly.
Next, lets start breaking down the caches you will encounter when using JPA applications. The JPA 2.0 standard defines two types of caches: A first level cache and a second level cache. The first level cache or EntityManager cache is used to properly handle JPA transactions. A first level cache only exist for the duration of the EntityManager. With the exception of long lived operations performed against a RDBMS, JPA EntityManager's are short lived and are created & destroyed per request or per transaction. In this case, given the nature of the queries, first level caches are cleared on every query. A second level cache on the other hand is a broader cache that can be used across transactions and users. This makes a JPA second level cache more powerful, since it can avoid constantly polling an RDBMS for the same data.
But even though the JPA 2.0 standard now addresses second level cache features, this was not the case in JPA 1.0. In the 1.0 version of the JPA standard only a first level cache was addressed, leaving the door completely open on the topic of a second level cache. This created a fragmented approach to caching in JPA implementations, which even now as JPA 2.0 compliant implementations emerge, some non-standard features continue to be part of certain implementations given the value they provide to JPA caching in general. So as I move forward, bear in mind that just like previous JPA topics, each JPA implementation can have its own particular way of dealing with second level caching.
I will start with OpenJPA, which has the least amount of proprietary caching options. To enable OpenJPA caching (i.e. second level caching) you need to declare the following two properties in an application's persistence.xml file:
<property name="openjpa.DataCache" value="true(EnableStatistics=true)"/><property name="openjpa.RemoteCommitProvider" value="sjvm"/>
The first property ensures caching and statistics are activated, while the second property is used to indicate caching take place on a single JVM. The following results and statistics were obtained with OpenJPA's second level cache enabled.
Query with OpenJPA caching | Time | Statistics | Time without statistics |
All records (2nd time) | 420 ms | read count=34936, | 347 ms |
All records (3rd time) | 254 ms | read count=52404, | 230 ms |
First name (2nd time) | 125 ms | read count=944, | 127 ms |
First name (3rd time) | 114 ms | read count=1416, | 132 ms |
Last name (2nd time) | 63 ms | read count=292, | 53 ms |
Last name (3rd time) | 49 ms | read count=438, | 50 ms |
By ID (2nd time) | 5 ms | read count=2, | 1 ms |
By ID (3rd time) | 4 ms | read count=3, | 1 ms |
As these test results illustrate, executing subsequent JPA queries with OpenJPA's second level cache produce superior results. Another important behavior illustrated in some of these test cases is that by simply disabling statistics -- and still using the second level cache -- query times improve even more. The OpenJPA statistics also demonstrate how the cache is being used. Notice that on each subsequent query the statistics field 'hit count' is duplicated, which means data is being read from the cache (i.e. a hit). Also notice the statistics field 'write count' remains static, which means data is only written once from the RDBMS to the cache.
This is pretty basic functionality for a second level cache. On certain occasions a need may arise to interact directly with a cache. These interactions can range from prohibiting an entity from being cached, assigning a particular amount of memory to a cache, forcing an entity to always be cached, flushing all the data contained in a cache, or even plugging-in a third party caching solution to provide a more robust strategy, among other things.
The JPA 2.0 standard provides a very basic feature set in terms of second level caching through javax.persistence.Cache. Upon consulting this interface, you'll realize it only provides four methods charged with verifying the presence of entities and evicting them. This feature set not only proves to be limited, but also cumbersome since it can only be leveraged programmatically (i.e. through an API). In this sense, and as I've already mentioned, JPA implementations have provided a series of features ranging from persistence.xml properties to Java annotations related to second level caching.
OpenJPA offers several of these second level caching features, including a separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will point you directly to OpenJPA's cache documentation available at http://openjpa.apache.org/builds/apache-openjpa-2.1.0-SNAPSHOT/docs/manual/ref_guide_caching.html#ref_guide_cache_query so you can try these parameters for yourself on the accompanying application source code.
Hibernate just like OpenJPA has its second level cache disabled. To enable Hibernate's second level cache you need to add the following properties to an application's persistence.xml file:
<property name="hibernate.cache.use_second_level_cache" value="true"/><property name="hibernate.cache.provider_class" value="org.hibernate.cache.HashtableCacheProvider"/>
Its worth mentioning that Hibernate has integral support for other second level caches. The previous properties displayed how to enable the HashtableCacheProvider cache -- the simplest of the integral second level caches -- but Hibernate also provides support for five additional caches, which include: EHCache, OSCache, SwarmCache, JBoss cache 1 and JBoss cache 2, all of which provide distinct features, albeit require additional configuration. Besides these properties, Hibernate also requires that each JPA entity be declared with a caching strategy. In this case, since the Person entity is read only, a caching strategy like the following would be used:
<class name="com.webforefront.jpa.domain.Player"><cache usage="read-only"/></class>
Similar to OpenJPA, Hibernate also offers several second level caching features through proprietary annotations and configurations, as well as support for the separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will also point you directly to Hibernate's cache documentation available at http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache so you can try these parameters for yourself on the accompanying application source code.
Unlike OpenJPA and Hibernate, EclipseLink's second level cache is enabled by default, therefore there is no need to provide any additional configuration. However, similar to its counterparts, EclipseLink also has a series of proprietary second level cache features which can enhance JPA performance. You can find more information on these features by consulting EclipseLink's cache documentation available at: http://wiki.eclipse.org/Introduction_to_Cache_(ELUG)
With this we bring our discussion on object relational mapping performance with JPA to a close. I hope you found the various tests and metrics presented here a helpful aid in making decisions about your own JPA applications. In addition, don't forget you can rely on the accompanying source code to try out several JPA variations more ad-hoc to your circumstances.
About the author
Daniel Rubio is an independent technology consultant specializing in enterprise and web-based software. He blogs regularly on these and other software areas at http://www.webforefront.com. He's also authored and co-authored three books on Java technology.
Source code/Application installation
* Install MySQL on your workstation (Tested on MySQL 5.1.37-64 bits) - http://dev.mysql.com/downloads/
* Install data set on MySQL - Go to http://www.baseball-databank.org/ and click on the link titled 'Database in MySQL form'. This will download a zipped file with a series of MySQL data structures containing baseball statistics. First create a MySQL database to host the data using the command: 'mysqladmin -p<password> create jpaperformance'. This will create a database named 'jpaperformance'. Next, load the baseball statistics using the following command: 'mysql -p<password> -D jpaperformance < BDB-sql-2009-11-25.sql' where 'BDB-sql-2009-11.25.sql' represents the unzipped SQL script obtained by extracting the zip file you dowloaded.
* Create JPA application WARs - The download includes source code, library dependencies and an Ant build file. This includes all three JPA implementations Hibernate 3.5.3, EclipseLink 2.1 and OpenJPA 2.1.
- To build the JPA Hibernate WAR - ant hibernate
- To build the JPA EclipseLink WAR - ant eclipselink
- To build the JPA OpenJPA WAR - ant openjpa
- All builds are placed under the dist/<jpa_implementation> directories.
* Deploy to Tomcat 6.0.26 - Copy the MySQL Java driver and Spring Tomcat Weaver -- included in the download directory 'tomcat_jar_deps' -- to Apache Tomcat's /lib directory.
- Copy each JPA application WAR to Apache Tomcat's /webapps directory, as needed.
* Deployment URL's
- http://localhost:8080/hibernate/hibernate/home ( Query all Player objects )
- http://localhost:8080/eclipselink/eclipselink/home ( Query all Player objects )
- http://localhost:8080/openjpa/openjpa/home ( Query all Player objects )
- http://localhost:8080/hibernate/hibernate/firstname/<player_firstname> ( Query Player objects by first name)
- http://localhost:8080/eclipselink/eclipselink/firstname/<player_firstname> ( Query Player objects by first name)
- http://localhost:8080/openjpa/openjpa/firstname/<player_firstname> ( Query Player objects by first name)
- http://localhost:8080/hibernate/hibernate/lastname/<player_lastname> ( Query Player objects by last name)
- http://localhost:8080/eclipselink/eclipselink/lastname/<player_lastname> ( Query Player objects by last name)
- http://localhost:8080/openjpa/openjpa/lastname/<player_lastname> ( Query Player objects by last name)
- http://localhost:8080/hibernate/hibernate/playerid/<player_id> (Query Player by id)
- http://localhost:8080/eclipselink/eclipselink/playerid/<player_id> ( Query Player by id)
- http://localhost:8080/openjpa/openjpa/playerid/<player_id> ( Query Player by id)
Opinions expressed by DZone contributors are their own.
Comments