Detecting and Resolving Database Connection Leaks with Java Applications
Take a look at this tutorial that demonstrates how you can find and remove connection leaks between your application and database in Java apps.
Join the DZone community and get the full member experience.
Join For FreeWhile working on customer's production setup we got an opportunity to analyze his code and configuration. Problem customer faced was that during high traffic, application become unresponsive and it doesn't accept any further request. Only workaround was to restart the application. The application was deployed on Red Hat Fuse 6.3.0.
DB Team analyzed that there were too many connections created on Database which was impacting Database too. After going through code and configuration we found that the application code was not implemented with any database connection pool.
The first step was to configure database connection pooling using dbcp driver. Due to compatibility issues we used legacy Apache DBCP 1.4 with commons-pool 1 library. Initially configuration were set like following. These configurations are pretty standard and are more than sufficient for most of the production environment. All these configuration details are mentioned commons-dbcp doc.
xxxxxxxxxx
<bean id="XXX" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
<property name="driverClassName" value="oracle.jdbc.OracleDriver"/>
<property name="url" value="${oracle.url}"/>
<property name="username" value="${oracle.user}"/>
<property name="password" value="${oracle.password}"/>
<property name="initialSize" value="10"/>
<property name="minIdle" value="10"/>
<property name="maxIdle" value="30"/>
<property name="maxActive" value="50"/>
<property name="testOnBorrow" value="true"/>
<property name="testOnReturn" value="true"/>
<property name="testWhileIdle" value="true"/>
<property name="validationQuery" value="SELECT 1 FROM DUAL"/>
<property name="connectionProperties" value="oracle.jdbc.ReadTimeout=10000"/>
<property name="timeBetweenEvictionRunsMillis" value="-1"/>
</bean>
After configuring the connection pooling too, the application got stuck and it was not responsive to further incoming requests. However, connections to the database as observed by the database team were reduced. One strange behavior was if we analyze the number of connections to the database from the application, the count was reduced to 0. While in connection pool minIdle and InitialSize was set to 10. Hence it was expected that connection should remain 10 at any point but at the time when application got stuck we observed 0 connection to the database from the application.
xxxxxxxxxx
netstat -anp|grep [application_pid]|grep [database_port]
Thus, there were connection leaks somewhere in his application. A connection leak means some of the database request/transaction are not getting closed properly or are not getting committed and finally those connections are getting abondoned and closed permanently. But there were many applications (more than 50) interacting with the database and it wasn't very clear where could be the connection leak. Also going through all the applications and reviewing code and configuration was going to be difficult and would have taken at least a couple of weeks as it would have required thorough testing too.
- To narrow down this problem and to find the possible cause of connection leak, we can set the following dbcp configurations too.
x
<property name="logAbandoned" value="true"/>
<property name="removeAbandoned" value="true"/>
<property name="removeAbandonedTimeout" value="30"/>
<property name="timeBetweenEvictionRunsMillis" value="30000" />
Here removeAbandoned when set to true to try to remove abandoned connections and return them to pool again in configured removeAbandonedTimeout in seconds. Setting this to true can recover database connections from poorly written applications that fail to close a connection. The logAbandoned property is also very important as it can log the complete stack-trace which might be leaking the connection, thus can be very useful to identify connection leak in application. Stack-Trace is logged in terminal itself. In Red Hat Fuse we can see these stack-traces logged by logAbandoned in karaf terminal and not in application log or fuse.log file. All these properties are mentioned commons-dbcp doc.
The timeBetweenEvictionRunsMillis property can also be helpful, it is set in milliseconds. When set than a separate thread will run to remove idle object evictor thread in every configured millisecond. Its default value is -1 which means this idle object evictor thread wouldn't be active and running and only when set to a positive integer then it would be effective. Sometimes firewall in between or even database close idle connections permanently, having this timeBetweenEvictionRunsMillis will help those connections to return to pool.
The above configurations removeAbandoned and timeBetweenEvictionRunsMillis are just a workaround to avoid the application getting unresponsive, these might delay application getting unresponsive.
To further narrow down the problem, at the time of issue we captured thread-dump for Red Hat Fuse where that application was installed with jstack utility. This jstack utility is available with JDK installation only not with JRE based installation. So if you don't have the jstack utility available then try to download JDK with the same version of your JRE based java installation. Having the same version is important otherwise it may lead to major/minor version error due to compatibility conflicts between jstack utility of JDK and Application JVM which is running in JRE. This jstack utility is found within bin folder of JDK installation.
xxxxxxxxxx
jstack -l [application_pid] > threaddump.txt
Note that we captured 3 to 4 samples of thread-dump in interval of 20 seconds, having different sample in 10 or 20 second interval help in identifying long running thread. When we open this threaddump.txt in any of our favourite IDE, we found following stack-trace.
xxxxxxxxxx
"qtp308220548-1262" #1262 prio=5 os_prio=0 tid=0x00007fc7bc192800 nid=0x70aa in Object.wait() [0x00007fc6f24d9000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1118)
- locked <0x00000007382b9f30> (a org.apache.commons.pool.impl.GenericObjectPool$Latch)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at Proxy4fb1bf6f_84b1_4ec9_94ae_0661ba1b9af6.getConnection(Unknown Source)
at org.apache.camel.component.jdbc.JdbcProducer.processingSqlBySettingAutoCommit(JdbcProducer.java:80)
at org.apache.camel.component.jdbc.JdbcProducer.process(JdbcProducer.java:67)
at org.apache.camel.util.AsyncProcessorConverterHelper$ProcessorToAsyncProcessorBridge.process(AsyncProcessorConverterHelper.java:61)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:145)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:468)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:196)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:121)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:196)
at org.apache.camel.component.cxf.jaxrs.CxfRsInvoker.asyncInvoke(CxfRsInvoker.java:93)
- locked <0x000000074cf27b40> (a org.apache.cxf.transport.http.Servlet3ContinuationProvider$Servlet3Continuation)
at org.apache.camel.component.cxf.jaxrs.CxfRsInvoker.performInvocation(CxfRsInvoker.java:68)
at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:189)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:99)
Here from the org.apache.commons.pool.impl.GenericObjectPool.borrowObject operation we find that thread is waiting for validating a connection from connection pool.
Further, if we go through stacktrace we find that this thread originates from camel-jdbc producer and further from CXF JAXRS component. This could be a problematic application which we observed in thread-dump.
Above all, the configurations helped us to identify problems and resolve them.
Finally, using try-with-resources for database connections can also be helpful to close resources automatically but dbcp 1.4 version is more compatible with Java6. Automatic resource Management(try-with-resources) can work with dbcp 1.4 but it would need more testing to ensure nothing else break thus we avoided it for dbcp 1.4 which is a legacy library.
Having the log statement everywhere while initiating and just after closing connection in finally clause will certainly help as well because with load test we can compare number of initiating logs and closing logs, any difference between them means connection leak and the connection is not closed.
Opinions expressed by DZone contributors are their own.
Comments