Practical PHP Patterns: Unit of Work
Join the DZone community and get the full member experience.
Join For FreeThe Unit of Work pattern is one of the most complex moving parts of Object-Relational Mappers, and usually of Data Mappers in general. A Unit of Work is a component (for us, an object with collaborators) which keeps track of the new, modified and deleted domain objects whose changes have to be reflected in the data store. At at the end of a transaction the Unit of Work, if used correctly, is capable of producing a list of changes to perform on the data store, solving concurrency or consistency problems, and avoiding too many redundant queries in the relational case or a chatty communication in the schemaless one.
As I've already said, the Unit of Work pattern is usually not employed alone but as part of a Data Mapper, which provides a different interface to the internal client code and mixes up this pattern with several other ones.
The minimum transaction that a PHP Unit of Work performs is usually an HTTP request, or a session composed by more than one request in case the domain objects can be saved in an intermediate store (like $_SESSION or a cache of any kind). Being able to serialize objects in a store and reattaching them to the Unit of Work during subsequent requests is not a trivial problem.
Advantages
The power of a Unit of Work resides in the fact that the actual database transaction is only performed (and kept opened) when the commit() method of the Unit of Work is called, while until that moment there is ideally no use of the database connection. This paradigm is called batch update.
Objects stored in a Unit of Work have usually an associated state, like:
- new (which correspondes to INSERT queries during the batch update)
- clean (no SQL queries have to be issued since the object has been retrieved and not modified)
- dirty (UPDATE queries)
- removed (DELETE queries)
There are different strategies for detecting changes to the object graph. The simplest strategy is comparing objects with a clean copy kept in memory (while it is usually not performance-wise to compare them with the database.)
A more complex solution is having a specific interface which is implemented by the objects, so that they can manage their state and declare they are dirty or have to be removed. This implementation choice introduces a dependency from the domain layer to the infrastructure one, thus I prefer heavier approaches like the former, which is equivalent to generate a diff with your source control system of choice, but on the object graph instead of a codebase: the source files are not responsible for diffing themselves.
Furthermore, the Unit of Work decoupling from the database state introduces an upper level of management, that makes us able to rollback changes if some constraint are not satisfied, or the computation has produced an error. In PHP, the client code can simply throw the object graph away, and the partial Unit of Work changeset is forgotten in the next requests.
Issues
While decoupling the object graph from the data store to perform custom computations is a comfortable possibility for the client code, at the same time it can be an issue that introduces stale data. The more the objects are kept in the Unit of Work, the more the data store is prone to external concurrent modifications inconsistent with the in-memory graph (for example updating fields with different values than the ones modified in this very session.) Either a optimistic or pessimistic locking mechanism has to be introduced when the scope of the object graph is longer than the few seconds necessary of producing an HTTP response, or even less than that when the traffic is higher.
Injecting the Unit of Work in the domain objects so that they can track their state can be problematic and too much an invasion of the domain layer. Usually the problem is solved the other way around: when the objects are passed to the Object-Relational Mapper (almost always implemented as a Data Mapper and not as an Active Record), it delegates part of the logic to the Unit of Work, which is a first-class citizen and can be tested independently from the other components of the library.
The alternative to the inherent complexity of the Unit of Work pattern is saving an object at the moment it is updated. This solution is problematic because either the client code has to explicitly call save() methods, or queries (read modification to the data store in case of non-relational model) have to be performed at the very time of an atomic change, for instance issuing multiple UPDATE statements, one for every time a field is modified.
Example
The sample code of this article is the internal API of Doctrine 2. The actual Unit of Work code is dependent on the strategy adopted to detect changes to domain objects, but the interface exposed to the Entity Manager is always the same and should provide a panoramic of an Unit of Work's responsibilities and features.
In this implementation, the methods persist() and remove() are used to introduce new objects to the Unit of Work or to schedule something for deletion from the database, while commit() executes a batch update on demand.
<?php
namespace Doctrine\ORM;
use Exception,
Doctrine\Common\Collections\ArrayCollection,
Doctrine\Common\Collections\Collection,
Doctrine\Common\NotifyPropertyChanged,
Doctrine\Common\PropertyChangedListener,
Doctrine\ORM\Event\LifecycleEventArgs,
Doctrine\ORM\Proxy\Proxy;
/**
* The UnitOfWork is responsible for tracking changes to objects during an
* "object-level" transaction and for writing out changes to the database
* in the correct order.
*
* @since 2.0
* @author Benjamin Eberlei <kontakt@beberlei.de>
* @author Guilherme Blanco <guilhermeblanco@hotmail.com>
* @author Jonathan Wage <jonwage@gmail.com>
* @author Roman Borschel <roman@code-factory.org>
* @internal This class contains highly performance-sensitive code.
*/
class UnitOfWork implements PropertyChangedListener
{
/**
* An entity is in MANAGED state when its persistence is managed by an EntityManager.
*/
const STATE_MANAGED = 1;
/**
* An entity is new if it has just been instantiated (i.e. using the "new" operator)
* and is not (yet) managed by an EntityManager.
*/
const STATE_NEW = 2;
/**
* A detached entity is an instance with a persistent identity that is not
* (or no longer) associated with an EntityManager (and a UnitOfWork).
*/
const STATE_DETACHED = 3;
/**
* A removed entity instance is an instance with a persistent identity,
* associated with an EntityManager, whose persistent state has been
* deleted (or is scheduled for deletion).
*/
const STATE_REMOVED = 4;
/**
* Commits the UnitOfWork, executing all operations that have been postponed
* up to this point. The state of all managed entities will be synchronized with
* the database.
*
* The operations are executed in the following order:
*
* 1) All entity insertions
* 2) All entity updates
* 3) All collection deletions
* 4) All collection updates
* 5) All entity deletions
*
*/
public function commit()
{
// Compute changes done since last commit.
$this->computeChangeSets();
if ( ! ($this->_entityInsertions ||
$this->_entityDeletions ||
$this->_entityUpdates ||
$this->_collectionUpdates ||
$this->_collectionDeletions ||
$this->_orphanRemovals)) {
return; // Nothing to do.
}
if ($this->_orphanRemovals) {
foreach ($this->_orphanRemovals as $orphan) {
$this->remove($orphan);
}
}
// Raise onFlush
if ($this->_evm->hasListeners(Events::onFlush)) {
$this->_evm->dispatchEvent(Events::onFlush, new Event\OnFlushEventArgs($this->_em));
}
// Now we need a commit order to maintain referential integrity
$commitOrder = $this->_getCommitOrder();
$conn = $this->_em->getConnection();
$conn->beginTransaction();
try {
if ($this->_entityInsertions) {
foreach ($commitOrder as $class) {
$this->_executeInserts($class);
}
}
if ($this->_entityUpdates) {
foreach ($commitOrder as $class) {
$this->_executeUpdates($class);
}
}
// Extra updates that were requested by persisters.
if ($this->_extraUpdates) {
$this->_executeExtraUpdates();
}
// Collection deletions (deletions of complete collections)
foreach ($this->_collectionDeletions as $collectionToDelete) {
$this->getCollectionPersister($collectionToDelete->getMapping())
->delete($collectionToDelete);
}
// Collection updates (deleteRows, updateRows, insertRows)
foreach ($this->_collectionUpdates as $collectionToUpdate) {
$this->getCollectionPersister($collectionToUpdate->getMapping())
->update($collectionToUpdate);
}
// Entity deletions come last and need to be in reverse commit order
if ($this->_entityDeletions) {
for ($count = count($commitOrder), $i = $count - 1; $i >= 0; --$i) {
$this->_executeDeletions($commitOrder[$i]);
}
}
$conn->commit();
} catch (Exception $e) {
$this->_em->close();
$conn->rollback();
throw $e;
}
// Take new snapshots from visited collections
foreach ($this->_visitedCollections as $coll) {
$coll->takeSnapshot();
}
// Clear up
$this->_entityInsertions =
$this->_entityUpdates =
$this->_entityDeletions =
$this->_extraUpdates =
$this->_entityChangeSets =
$this->_collectionUpdates =
$this->_collectionDeletions =
$this->_visitedCollections =
$this->_scheduledForDirtyCheck =
$this->_orphanRemovals = array();
}
/**
* Computes the changes that happened to a single entity.
*
* Modifies/populates the following properties:
*
* {@link _originalEntityData}
* If the entity is NEW or MANAGED but not yet fully persisted (only has an id)
* then it was not fetched from the database and therefore we have no original
* entity data yet. All of the current entity data is stored as the original entity data.
*
* {@link _entityChangeSets}
* The changes detected on all properties of the entity are stored there.
* A change is a tuple array where the first entry is the old value and the second
* entry is the new value of the property. Changesets are used by persisters
* to INSERT/UPDATE the persistent entity state.
*
* {@link _entityUpdates}
* If the entity is already fully MANAGED (has been fetched from the database before)
* and any changes to its properties are detected, then a reference to the entity is stored
* there to mark it for an update.
*
* {@link _collectionDeletions}
* If a PersistentCollection has been de-referenced in a fully MANAGED entity,
* then this collection is marked for deletion.
*
* @param ClassMetadata $class The class descriptor of the entity.
* @param object $entity The entity for which to compute the changes.
*/
public function computeChangeSet(Mapping\ClassMetadata $class, $entity)
{
// ...
}
/**
* Computes all the changes that have been done to entities and collections
* since the last commit and stores these changes in the _entityChangeSet map
* temporarily for access by the persisters, until the UoW commit is finished.
*/
public function computeChangeSets()
{
// ...
}
/**
* Schedules an entity for insertion into the database.
* If the entity already has an identifier, it will be added to the identity map.
*
* @param object $entity The entity to schedule for insertion.
*/
public function scheduleForInsert($entity)
{
$oid = spl_object_hash($entity);
if (isset($this->_entityUpdates[$oid])) {
throw new \InvalidArgumentException("Dirty entity can not be scheduled for insertion.");
}
if (isset($this->_entityDeletions[$oid])) {
throw new \InvalidArgumentException("Removed entity can not be scheduled for insertion.");
}
if (isset($this->_entityInsertions[$oid])) {
throw new \InvalidArgumentException("Entity can not be scheduled for insertion twice.");
}
$this->_entityInsertions[$oid] = $entity;
if (isset($this->_entityIdentifiers[$oid])) {
$this->addToIdentityMap($entity);
}
}
/**
* Schedules an entity for being updated.
*
* @param object $entity The entity to schedule for being updated.
*/
public function scheduleForUpdate($entity)
{
$oid = spl_object_hash($entity);
if ( ! isset($this->_entityIdentifiers[$oid])) {
throw new \InvalidArgumentException("Entity has no identity.");
}
if (isset($this->_entityDeletions[$oid])) {
throw new \InvalidArgumentException("Entity is removed.");
}
if ( ! isset($this->_entityUpdates[$oid]) && ! isset($this->_entityInsertions[$oid])) {
$this->_entityUpdates[$oid] = $entity;
}
}
/**
* INTERNAL:
* Schedules an entity for deletion.
*
* @param object $entity
*/
public function scheduleForDelete($entity)
{
$oid = spl_object_hash($entity);
if (isset($this->_entityInsertions[$oid])) {
if ($this->isInIdentityMap($entity)) {
$this->removeFromIdentityMap($entity);
}
unset($this->_entityInsertions[$oid]);
return; // entity has not been persisted yet, so nothing more to do.
}
if ( ! $this->isInIdentityMap($entity)) {
return; // ignore
}
$this->removeFromIdentityMap($entity);
if (isset($this->_entityUpdates[$oid])) {
unset($this->_entityUpdates[$oid]);
}
if ( ! isset($this->_entityDeletions[$oid])) {
$this->_entityDeletions[$oid] = $entity;
}
}
/**
* Checks whether an entity is scheduled for insertion, update or deletion.
*
* @param $entity
* @return boolean
*/
public function isEntityScheduled($entity)
{
$oid = spl_object_hash($entity);
return isset($this->_entityInsertions[$oid]) ||
isset($this->_entityUpdates[$oid]) ||
isset($this->_entityDeletions[$oid]);
}
public function persist($entity)
{
$visited = array();
$this->_doPersist($entity, $visited);
}
/**
* Saves an entity as part of the current unit of work.
* This method is internally called during save() cascades as it tracks
* the already visited entities to prevent infinite recursions.
*
* NOTE: This method always considers entities that are not yet known to
* this UnitOfWork as NEW.
*
* @param object $entity The entity to persist.
* @param array $visited The already visited entities.
*/
private function _doPersist($entity, array &$visited)
{
$oid = spl_object_hash($entity);
if (isset($visited[$oid])) {
return; // Prevent infinite recursion
}
$visited[$oid] = $entity; // Mark visited
$class = $this->_em->getClassMetadata(get_class($entity));
$entityState = $this->getEntityState($entity, self::STATE_NEW);
switch ($entityState) {
case self::STATE_MANAGED:
// Nothing to do, except if policy is "deferred explicit"
if ($class->isChangeTrackingDeferredExplicit()) {
$this->scheduleForDirtyCheck($entity);
}
break;
case self::STATE_NEW:
if (isset($class->lifecycleCallbacks[Events::prePersist])) {
$class->invokeLifecycleCallbacks(Events::prePersist, $entity);
}
if ($this->_evm->hasListeners(Events::prePersist)) {
$this->_evm->dispatchEvent(Events::prePersist, new LifecycleEventArgs($entity, $this->_em));
}
$idGen = $class->idGenerator;
if ( ! $idGen->isPostInsertGenerator()) {
$idValue = $idGen->generate($this->_em, $entity);
if ( ! $idGen instanceof \Doctrine\ORM\Id\AssignedGenerator) {
$this->_entityIdentifiers[$oid] = array($class->identifier[0] => $idValue);
$class->setIdentifierValues($entity, $idValue);
} else {
$this->_entityIdentifiers[$oid] = $idValue;
}
}
$this->_entityStates[$oid] = self::STATE_MANAGED;
$this->scheduleForInsert($entity);
break;
case self::STATE_DETACHED:
throw new \InvalidArgumentException(
"Behavior of persist() for a detached entity is not yet defined.");
case self::STATE_REMOVED:
// Entity becomes managed again
if ($this->isScheduledForDelete($entity)) {
unset($this->_entityDeletions[$oid]);
} else {
//FIXME: There's more to think of here...
$this->scheduleForInsert($entity);
}
break;
default:
throw ORMException::invalidEntityState($entityState);
}
$this->_cascadePersist($entity, $visited);
}
/**
* Deletes an entity as part of the current unit of work.
*
* @param object $entity The entity to remove.
*/
public function remove($entity)
{
$visited = array();
$this->_doRemove($entity, $visited);
}
/**
* Deletes an entity as part of the current unit of work.
*
* This method is internally called during delete() cascades as it tracks
* the already visited entities to prevent infinite recursions.
*
* @param object $entity The entity to delete.
* @param array $visited The map of the already visited entities.
* @throws InvalidArgumentException If the instance is a detached entity.
*/
private function _doRemove($entity, array &$visited)
{
// ...
}
}
Opinions expressed by DZone contributors are their own.
Comments