ORM Is an Offensive Anti-Pattern
{editor's note: thanks to yegor bugayenko, a new mvb at dzone. among other things, yegor blogs about java and devops. we're pleased to have him on board as a most valuable blogger. check out his blog, yegor256.com .} tl;dr orm is a terrible anti-pattern that violates all principles of object-oriented programming, tearing objects apart and turning them into dumb and passive data bags. there is no excuse for orm existence in any application, be it a small web app or an enterprise-size system with thousands of tables and crud manipulations on them. what is the alternative? sql-speaking objects . vinni-pukh (1969) by fyodor khitruk how orm works object-relational mapping (orm) is a technique (a.k.a. design pattern) of accessing a relational database from an object-oriented language (java, for example). there are multiple implementations of orm in almost every language; for example: hibernate for java, activerecord for ruby on rails, doctrine for php, and sqlalchemy for python. in java, the orm design is even standardized as jpa . first, let's see how orm works, by example. let's use java, postgresql, and hibernate. let's say we have a single table in the database, called post : +-----+------------+--------------------------+ | id | date | title | +-----+------------+--------------------------+ | 9 | 10/24/2014 | how to cook a sandwich | | 13 | 11/03/2014 | my favorite movies | | 27 | 11/17/2014 | how much i love my job | +-----+------------+--------------------------+ now we want to crud-manipulate this table from our java app (crud stands for create, read, update, and delete). first, we should create a post class (i'm sorry it's so long, but that's the best i can do): @entity @table(name = "post") public class post { private int id; private date date; private string title; @id @generatedvalue public int getid() { return this.id; } @temporal(temporaltype.timestamp) public date getdate() { return this.date; } public title gettitle() { return this.title; } public void setdate(date when) { this.date = when; } public void settitle(string txt) { this.title = txt; } } before any operation with hibernate, we have to create a session factory: sessionfactory factory = new annotationconfiguration() .configure() .addannotatedclass(post.class) .buildsessionfactory(); this factory will give us "sessions" every time we want to manipulate with post objects. every manipulation with the session should be wrapped in this code block: session session = factory.opensession(); try { transaction txn = session.begintransaction(); // your manipulations with the orm, see below txn.commit(); } catch (hibernateexception ex) { txn.rollback(); } finally { session.close(); } when the session is ready, here is how we get a list of all posts from that database table: list posts = session.createquery("from post").list(); for (post post : (list) posts){ system.out.println("title: " + post.gettitle()); } i think it's clear what's going on here. hibernate is a big, powerful engine that makes a connection to the database, executes necessary sql select requests, and retrieves the data. then it makes instances of class post and stuffs them with the data. when the object comes to us, it is filled with data, and we should use getters to take them out, like we're using gettitle() above. when we want to do a reverse operation and send an object to the database, we do all of the same but in reverse order. we make an instance of class post , stuff it with the data, and ask hibernate to save it: post post = new post(); post.setdate(new date()); post.settitle("how to cook an omelette"); session.save(post); this is how almost every orm works. the basic principle is always the same — orm objects are anemic envelopes with data. we are talking with the orm framework, and the framework is talking to the database. objects only help us send our requests to the orm framework and understand its response. besides getters and setters, objects have no other methods. they don't even know which database they came from. this is how object-relational mapping works. what's wrong with it, you may ask? everything! what's wrong with orm? seriously, what is wrong? hibernate has been one of the most popular java libraries for more than 10 years already. almost every sql-intensive application in the world is using it. each java tutorial would mention hibernate (or maybe some other orm like toplink or openjpa) for a database-connected application. it's a standard de-facto and still i'm saying that it's wrong? yes. i'm claiming that the entire idea behind orm is wrong. its invention was maybe the second big mistake in oop after null reference . actually, i'm not the only one saying something like this, and definitely not the first. a lot about this subject has already been published by very respected authors, including ormhate by martin fowler, object-relational mapping is the vietnam of computer science by jeff atwood, the vietnam of computer science by ted neward, orm is an anti-pattern by laurie voss, and many others. however, my argument is different than what they're saying. even though their reasons are practical and valid, like "orm is slow" or "database upgrades are hard", they miss the main point. you can see a very good, practical answer to these practical arguments given by bozhidar bozhanov in his orm haters don’t get it blog post. the main point is that orm, instead of encapsulating database interaction inside an object, extracts it away, literally tearing a solid and cohesive living organism apart. one part of the object keeps the data while another one, implemented inside the orm engine (session factory), knows how to deal with this data and transfers it to the relational database. look at this picture; it illustrates what orm is doing. i, being a reader of posts, have to deal with two components: 1) the orm and 2) the "obtruncated" object returned to me. the behavior i'm interacting with is supposed to be provided through a single entry point, which is an object in oop. in the case of orm, i'm getting this behavior via two entry points — the orm and the "thing", which we can't even call an object. because of this terrible and offensive violation of the object-oriented paradigm, we have a lot of practical issues already mentioned in respected publications. i can only add a few more. sql is not hidden . users of orm should speak sql (or its dialect, like hql ). see the example above; we're calling session.createquery("from post") in order to get all posts. even though it's not sql, it is very similar to it. thus, the relational model is not encapsulated inside objects. instead, it is exposed to the entire application. everybody, with each object, inevitably has to deal with a relational model in order to get or save something. thus, orm doesn't hide and wrap the sql but pollutes the entire application with it. difficult to test . when some object is working a list of posts, it needs to deal with an instance of sessionfactory . how can we mock this dependency? we have to create a mock of it? how complex is this task? look at the code above, and you will realize how verbose and cumbersome that unit test will be. instead, we can write integration tests and connect the entire application to a test version of postgresql. in that case, there is no need to mock sessionfactory , but such tests will be rather slow, and even more important, our having-nothing-to-do-with-the-database objects will be tested against the database instance. a terrible design. again, let me reiterate. practical problems of orm are just consequences. the fundamental drawback is that orm tears objects apart, terribly and offensively violating the very idea of what an object is . sql-speaking objects what is the alternative? let me show it to you by example. let's try to design that class, post , my way. we'll have to break it down into two classes: post and posts , singular and plural. i already mentioned in one of my previous articles that a good object is always an abstraction of a real-life entity. here is how this principle works in practice. we have two entities: database table and table row. that's why we'll make two classes; posts will represent the table, and post will represent the row. as i also mentioned in that article , every object should work by contract and implement an interface. let's start our design with two interfaces. of course, our objects will be immutable. here is how posts would look: @immutable interface posts { iterable iterate(); post add(date date, string title); } this is how a single post would look: @immutable interface post { int id(); date date(); string title(); } here is how we will list all posts in the database table: posts posts = // we'll discuss this right now for (post post : posts.iterate()){ system.out.println("title: " + post.title()); } here is how we will create a new post: posts posts = // we'll discuss this right now posts.add(new date(), "how to cook an omelette"); as you see, we have true objects now. they are in charge of all operations, and they perfectly hide their implementation details. there are no transactions, sessions, or factories. we don't even know whether these objects are actually talking to the postgresql or if they keep all the data in text files. all we need from posts is an ability to list all posts for us and to create a new one. implementation details are perfectly hidden inside. now let's see how we can implement these two classes. i'm going to use jcabi-jdbc as a jdbc wrapper, but you can use something else or just plain jdbc if you like. it doesn't really matter. what matters is that your database interactions are hidden inside objects. let's start with posts and implement it in class pgposts ("pg" stands for postgresql): @immutable final class pgposts implements posts { private final source dbase; public pgposts(datasource data) { this.dbase = data; } public iterable iterate() { return new jdbcsession(this.dbase) .sql("select id from post") .select( new listoutcome( new listoutcome.mapping() { @override public post map(final resultset rset) { return new pgpost(rset.getinteger(1)); } } ) ); } public post add(date date, string title) { return new pgpost( this.dbase, new jdbcsession(this.dbase) .sql("insert into post (date, title) values (?, ?)") .set(new utc(date)) .set(title) .insert(new singleoutcome(integer.class)) ); } } next, let's implement the post interface in class pgpost : @immutable final class pgpost implements post { private final source dbase; private final int number; public pgpost(datasource data, int id) { this.dbase = data; this.number = id; } public int id() { return this.number; } public date date() { return new jdbcsession(this.dbase) .sql("select date from post where id = ?") .set(this.number) .select(new singleoutcome(utc.class)); } public string title() { return new jdbcsession(this.dbase) .sql("select title from post where id = ?") .set(this.number) .select(new singleoutcome(string.class)); } } this is how a full database interaction scenario would look like using the classes we just created: posts posts = new pgposts(dbase); for (post post : posts.iterate()){ system.out.println("title: " + post.title()); } post post = posts.add(new date(), "how to cook an omelette"); system.out.println("just added post #" + post.id()); you can see a full practical example here . it's an open source web app that works with postgresql using the exact approach explained above — sql-speaking objects. what about performance? i can hear you screaming, "what about performance?" in that script a few lines above, we're making many redundant round trips to the database. first, we retrieve post ids with select id and then, in order to get their titles, we make an extra select title call for each post. this is inefficient, or simply put, too slow. no worries; this is object-oriented programming, which means it is flexible! let's create a decorator of pgpost that will accept all data in its constructor and cache it internally, forever: @immutable final class constpost implements post { private final post origin; private final date dte; private final string ttl; public constpost(post post, date date, string title) { this.origin = post; this.dte = date; this.ttl = title; } public int id() { return this.origin.id(); } public date date() { return this.dte; } public string title() { return this.ttl; } } pay attention: this decorator doesn't know anything about postgresql or jdbc. it just decorates an object of type post and pre-caches the date and title. as usual, this decorator is also immutable. now let's create another implementation of posts that will return the "constant" objects: @immutable final class constpgposts implements posts { // ... public iterable iterate() { return new jdbcsession(this.dbase) .sql("select * from post") .select( new listoutcome( new listoutcome.mapping() { @override public post map(final resultset rset) { return new constpost( new pgpost(rset.getinteger(1)), utc.gettimestamp(rset, 2), rset.getstring(3) ); } } ) ); } } now all posts returned by iterate() of this new class are pre-equipped with dates and titles fetched in one round trip to the database. using decorators and multiple implementations of the same interface, you can compose any functionality you wish. what is the most important is that while functionality is being extended, the complexity of the design is not escalating, because classes don't grow in size. instead, we're introducing new classes that stay cohesive and solid, because they are small. what about transactions? every object should deal with its own transactions and encapsulate them the same way as select or insert queries. this will lead to nested transactions, which is perfectly fine provided the database server supports them. if there is no such support, create a session-wide transaction object that will accept a "callable" class. for example: final class txn { private final datasource dbase; public t call(callable callable) { jdbcsession session = new jdbcsession(this.dbase); try { session.sql("start transaction").exec(); t result = callable.call(); session.sql("commit").exec(); return result; } catch (exception ex) { session.sql("rollback").exec(); throw ex; } } } then, when you want to wrap a few object manipulations in one transaction, do it like this: new txn(dbase).call( new callable() { @override public integer call() { posts posts = new pgposts(dbase); post post = posts.add(new date(), "how to cook an omelette"); posts.comments().post("this is my first comment!"); return post.id(); } } ); this code will create a new post and post a comment to it. if one of the calls fail, the entire transaction will be rolled back. this approach looks object-oriented to me. i'm calling it "sql-speaking objects", because they know how to speak sql with the database server. it's their skill, perfectly encapsulated inside their borders. related posts you may also find these posts interesting: how much your objects encapsulate? how an immutable object can have state and behavior? seven virtues of a good object how immutability helps paired brackets
December 22, 2014
by Yegor Bugayenko
·
56,865 Views
·
5 Likes