Running Batch DB Updates in Java Persistence Technologies
it is a good idea to use the batch update facilities built on top of JDBC in order to submit multiple SQL commands as a single request to the underlying database.
Join the DZone community and get the full member experience.
Join For FreeRelational data access often needs to insert or update multiple rows in a database as part of a single operation. In such scenarios, it is a good idea to use the batch update facilities built on top of JDBC in order to submit multiple SQL commands as a single request to the underlying database. This reduces the number of roundtrips to the database, hence improving the result time of the operation.
JDBC Batched Updates
The Statement
interface and its subinterfaces, PreparedStatement
and CallableStatement
support executing multiple SQL statements as a batch, by maintaining a collection of these statements that the application can add to using the method Statement.addBatch(sql)
. When the batch of statements is ready to be executed, the method Statement.executeBatch()
can be called to execute them in one unit. To clear the current batch, the application can call the method Statement.clearBatch()
. Only statements that return an update count are eligible for batch execution; select
statements will throw a BatchUpdateException
.
Example
The following code uses a Statement
's batch to add a student to a course:
try(Connection connection = dataSource.getConnection()) {
connection.setAutoCommit(false);
try(Statement statement = connection.createStatement()) {
statement.addBatch("insert into student values (14, 'John Doe')");
statement.addBatch("insert into course values (3, 'Biology')");
statement.addBatch("insert into student_courses values (14, 3)");
int[] updateCounts = statement.executeBatch();
connection.commit();
} catch(BatchUpdateException ex) {
connection.rollback();
... // do something with exception
}
}
Another example using a PreparedStatement
. Given a customer
table, we want to import a list of customers. Notice that the method addBatch
does not take an SQL string here, instead it adds the specified parameters to the prepared statement's batch of commands.
connection.setAutoCommit(false);
try(PreparedStatement statement = connection.prepareStatement("insert into customer values (?, ?)")) {
int n = 0;
for(Customer customer : customers) {
statement.setInt(1, ++n);
statement.setString(2, customer.getName());
statement.addBatch();
}
int[] updateCounts = statement.executeBatch();
connection.commit();
} catch(BatchUpdateException ex) {
connection.rollback();
... // do something with exception
}
Switching Off Auto-Commit
One important thing to notice in the above code snippets is the call to connection.setAutoCommit(false)
, which allows the application to control when to commit the transaction. In the previous code, we only commit the transaction when all statements are executed successfully. In case of a BatchUpdateException
thrown because of a failed statement, we roll back the transaction so that no effect happens on the database. We could have decided to examine the BatchUpdateException
(as we'll see shortly) to see which statement(s) failed and still decide to commit the statements that were processed successfully.
Disabling auto-commit mode should always be done when executing a batch of updates. Otherwise, the result of the updates depends on the behavior of the JDBC driver: it may or may not commit the successful statements.
Update Counts and BatchUpdateException
The method Statement.executeBatch()
returns an array of integers where each value is the number of affected rows by the corresponding statement. The order of values matches the order in which statements are added to the batch. Specifically, each element of the array is:
- an integer >= 0, reflecting the affected row count by the update statement,
- or the constant
Statement.SUCCESS_NO_INFO
, indicating that the statement was successful but the affected row count is unknown.
In case one of the statements failed, or was not a valid update statement, the method executeBatch()
throws a BatchUpdateException
. The exception can be examined by calling BatchUpdateException.getUpdateCounts()
, which returns an array of integers. There are two possible scenarios:
- If the JDBC driver allows continuing the processing of remaining statements upon a failed one, then the result of
BatchUpdateException.getUpdateCounts()
is an array containing as many integers as there were statements in the batch, where the integers correspond to the affected row count for successful statements, except for the failed ones where the corresponding array element will be the constantStatement.EXECUTE_FAILED
. - If the JDBC driver does not continue upon a failed statement, then the result of
BatchUpdateException.getUpdateCounts()
is an array containing the affected row count for all successful statements until the first failed one.
Batch Updates Using Spring's JdbcTemplate
Spring offers a convenient class as part of its support for JDBC. It reduces the amount of boilerplate code required when using plain JDBC such as processing result sets and closing resources. It also makes batch updates easier, as shown in the following example:
List<Customer> customers = ...;
jdbcTemplate.batchUpdate("insert into customer values (?, ?)",
new BatchPreparedStatementSetter() {
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setLong(1, customers.get(i).getId());
ps.setString(2, customers.get(i).getName());
}
@Override
public int getBatchSize() {
return customers.size();
}
});
Batch Updates Using Hibernate
Hibernate can also make use of JDBC's batching facility when generating the statements corresponding to its persistence operations. The main configuration property is hibernate.jdbc.batch_size
which specifies the maximum batch size. This setting can be overriden for a specific session using the method Session.setJdbcBatchSize()
. Hibernate will use the value specified in the method on the current session, and if not set it uses the value in the global session factory-level setting hibernate.jdbc.batch_size
.
The earlier example that stores a list of customers would use the persistence methods in the Session
instance:
Transaction transaction = null;
try (Session session = sessionFactory.openSession()) {
transaction = session.getTransaction();
transaction.begin();
for (Customer customer : customers) {
session.persist(customer);
}
transaction.commit();
} catch (RuntimeException ex) {
if (transaction != null) {
transaction.rollback();
}
throw ex;
}
When the transaction.commit()
is invoked, Hibernate will send the SQL statements that insert the customer rows. If batching is enabled as described earlier (either via hibernate.jdbc.batch_size
or by calling Session.setJdbcBatchSize(batchSize)
), then all the generated statements will be sent as a single request. Otherwise, each statement is sent as a single request.
When employing batched updates in Hibernate for a large number of entity objects, it is a good practice to flush the session and clear its cache periodically as opposed to flushing the session at the end of the transaction. This reduces memory usage by the session cache because it holds entities that are in persistent state:
Transaction transaction = null;
try (Session session = sessionFactory.openSession()) {
transaction = session.getTransaction();
transaction.begin();
int n = 0;
for (Customer customer : customers) {
if (++n % batchSize == 0) {
// Flush and clear the cache every batch
session.flush();
session.clear();
}
session.persist(customer);
}
transaction.commit();
} catch (RuntimeException ex) {
if (transaction != null) {
transaction.rollback();
}
throw ex;
}
One important thing to know is that batch insert (not update or delete) doesn't work with entities using identity columns (i.e. whose generation strategy is GenerationType.IDENTITY
, because Hibernate needs to generate the identifier when persisting the entity and in this case the value can only be generated by sending the insert statement.
It should be noted that the above applies equally if the application uses an EntityManager
instead of directly using a Session
.
Batch Updates Using jOOQ
jOOQ also supports batch updates easily. Here's an example that follows the earlier examples:
DSLContext create = ...;
BatchBindStep batch = create.batch(create.insertInto(CUSTOMER, ID, NAME)
.values((Integer) null, null));
int n = 0;
for (Customer customer : customers) {
batch.bind(++n, customer.getName());
}
int[] updateCounts = batch.execute();
Summary
All major Java persistence technologies support batch mode updates to relational databases leveraging the JDBC API. Such a mode can improve performance for applications involving heavy workloads by reducing the number of network roundtrips to the database server.
Published at DZone with permission of Mahmoud Anouti, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments