How to Store Text in PostgreSQL: Tips, Tricks, and Traps
In this article, we will review various options to store long text in the PostgreSQL database: @Lob attributes, TEXT, and long VARCHAR table columns. Also, we'll have a look at the difference between Hibernate 5 and 6 in storing long text data.
Join the DZone community and get the full member experience.
Join For FreeDDL generation based on JPA entities definition is a daily task for many developers. In most cases, we use tools like Hibernate's built-in generator or JPA Buddy plugin. They make the job easier, but there are exceptions. When it comes to storing big chunks of data in the database, things get a bit complicated.
Use Case: Storing Documents
Let’s assume we need to store a document object with its content in the PostgreSQL database. The JPA entity code for this might look like the code below:
@Entity
@Table(name = "document")
public class Document {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "id", nullable = false)
private Long id;
@Column(name = "date_created", nullable = false)
private LocalDateTime dateCreated;
@Column(name = "doc_txt")
private String docText;
//Getters and setters omitted for brevity
}
The question is: what if we need to store really long document text? In Java, the string datatype can hold about 2Gb of text data, but the table column size will be limited to 255 characters by default for the model above. So, what should we change?
Option 1: Use LOB Storage
In relational databases, a particular data type exists to store big amounts of data: LOB (Large OBject). Once we need to store large text in the database, we can start with defining a LOB column. All we need to do is mark the docText
attribute with the @Lob
annotation.
@Lob
@Column(name = "doc_txt")
private String docText;
Let’s use Hibernate to generate a DDL for the table to map the `Document` entity. The SQL will be:
create table document (
id int8 generated by default as identity,
date_created timestamp not null,
doc_txt oid,
primary key (id)
);
As we can see, the doc_text
column datatype is oid
. What is it? According to the documentation:
PostgreSQL provides two distinct ways to store binary data. Binary data can be stored in a table using the data type BYTEA or by using the Large Object feature, which stores the binary data in a separate table in a special format and refers to that table by storing a value of type OID in your table.
In our case, the second option is in effect. This separate table’s name is pg_largeobject
, and it stores data split into “pages”, usually 2 kB each, as stated in the docs.
So, Hibernate stores large text in a separate table as binary data. Does it mean we should do an additional join when selecting data or an additional insert when saving it? Let’s enable SQL logging, create the Document
entity and save it to the database using Spring Data JPA.
Document doc = new Document();
doc.setDateCreated(LocalDateTime.of(2020, 1, 1, 10, 10));
doc.setDocText("This is the doc text");
Document saved = documentRepository.save(doc);
Hibernate will show an ordinary SQL insert in the console:
insert
into
document
(date_created, doc_txt)
values
(?, ?)
Now we can check if the data was correctly stored by executing the following SQL in a console:
select * from document
The result that we will see should be similar to this:
id | date_created | doc_txt |
1 | 2020-01-01 10:10:00 | 76388 |
We don’t see the document text in this table, just a reference to the object in the large object storage. Let’s check the pg_largeobject
table:
select * from pg_largeobject where loid=76338
Now we can see the document text.
loid | pageno | data |
76388 | 0 | This is the doc text |
So, Hibernate saves data into two tables automatically under the hood. Now we can try to fetch the document data using Spring Data JPA:
documentRepository.findById(1L).ifPresent(d -> System.out.println(d.getDocText()));
We can see the following SQL in the console:
select
document0_.id as id1_0_0_,
document0_.date_created as date_cre2_0_0_,
document0_.doc_txt as doc_txt3_0_0_
from
document document0_
where
document0_.id=?
And the output should be as expected:
This is the doc text
Hibernate selects the data from the pg_largeobject
table transparently. Let’s try to use JPQL to execute the same query. To do this, we create an additional Spring Data JPA repository method and invoke it:
//repository
@Query("select d from Document d where d.id = ?1")
Optional<Document> findByIdIs(Long id);
//...
//invocation
documentRepository.findByIdIs(1L).ifPresent(d -> System.out.println(d.getDocText()));
The method will fail:
org.springframework.orm.jpa.JpaSystemException: Unable to access lob stream
…
Caused by: org.hibernate.HibernateException: Unable to access lob stream
…
Caused by: org.postgresql.util.PSQLException: Large Objects may not be used in auto-commit mode.
Hibernate performs additional database reading to fetch LOB data. In auto-commit
mode, this reading is executed in a separate transaction. PostgreSQL driver explicitly prohibits it, as shown in the error message above. To fix this, we need to execute such queries in one transaction or disable auto-commit mode.
Spring Data JPA methods from `CrudRepository
` like findById()
and findAll()
are executed in one transaction by default. That is why everything worked fine in the first example. When we use Spring Data JPA query methods or JPQL queries, we must use @Transactional
explicitly as in the example below.
@Transactional
@Query("select d from Document d where d.id = ?1")
Optional<Document> findByIdIs(Long id);
@Transactional
List<Document> findByDateCreatedIsBefore(LocalDateTime dateCreated);
If disabling auto-commit
mode seems preferable to using the @Transactional
annotation, we should look into the documentation for the app libraries. For example, to do it for the default connection pool implementation (HikariCP) in Spring Boot, we need to set the spring.datasource.hikari.auto-commit
property to false
.
Storing text in a separate table might cause other issues. Let’s add a repository method to select documents using the LIKE
clause for the docText field:
@Transactional
List<Document> findByDocTextLike(String text);
This method will generate the following query:
select
document0_.id as id1_0_,
document0_.date_created as date_cre2_0_,
document0_.doc_txt as doc_txt3_0_
from
document document0_
where
document0_.doc_txt like ? escape ?
...and this query will fail with the following error:
org.springframework.dao.InvalidDataAccessResourceUsageException: could not extract ResultSet; SQL [n/a];
…
Caused by: org.postgresql.util.PSQLException: ERROR: function pg_catalog.like_escape(bigint, character varying) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Hibernate cannot generate proper SQL to handle the LIKE
clause for LOB text columns. For this case, we can use the native query. In this query, we have to fetch text data from the LOB storage and convert it to string format. After that, we can use it in the LIKE
clause (and don’t forget about `@Transactional
):
@Query(value = "select * from document d " +
"where convert_from(lo_get(doc_txt), 'UTF-8') like '%'||?1||'%'", nativeQuery = true)
@Transactional
List<Document> findByDocTextLike(String text);
Now everything works fine. Please remember that native queries may not be compatible with other RDBMSes and are not validated in runtime. Use them only when absolutely necessary.
Conclusion: Storing Text as LOB
So, what are the pros and cons of storing large texts as LOB objects in PostgreSQL?
Pro:
- PostgreSQL uses optimized storage for LOB objects
- We can store up to 4Gb of text there
Con:
- Some functions (
LIKE
,SUBSTRING
, etc.) in theWHERE
clause do not work in Hibernate for LOB text columns. We need to use native queries for this. - To fetch text with JPQL or Spring Data JPA repository query methods, we must use the
@Transactional
annotation for repository methods or disable the auto-commit mode
The question here is: why don’t we store text data right in the table? Let’s discuss this option too.
Option 2: Column Re-Definition
PostgreSQL allows us to store long text data in a column of a particular datatype - TEXT
. We can specify the column definition right in the annotation.
@Column(name = "doc_txt", columnDefinition = "text")
private String docText;
This allows us to work with long text in a "usual" way. No transactions, native queries, and JPQL works as expected. Compared to LOB type there is a limitation
..., the longest possible character string that can be stored is about 1 GB
It is smaller than 4Gb, allowed by LOB storage, but still long enough for most use cases.
The only problem here is the hardcoded column definition. To overcome it, we can use annotation @Type
and converter org.hibernate.type.TextType
in Hibernate 5. It has an advantage over the previous column definition: it is not vendor-specific.
@Type(type = "org.hibernate.type.TextType")
@Column(name = "doc_txt")
private String docText;
In Hibernate 6, the org.hibernate.type.TextType
class was removed. To define a column to store a long text, we can define the attribute in the following way:
@Column(name = "doc_txt", length = Length.LOB_DEFAULT)
private String docText;
This will give us the following column definition in the database: doc_txt varchar(1048576)
. It is not the TEXT
datatype, but it can still store about 1Gb of text in the table. It is the largest possible character string in PostgreSQL.
We can generate a column with TEXT
datatype in Hibernate 6 by defining the docText
attribute like this:
@JdbcTypeCode(SqlTypes.LONG32VARCHAR)
@Column(name = "doc_txt")
private String docText;
Unfortunately, as of today (June 2022), Hibernate 6 cannot fetch data from the table. It generates the correct table and column definition of the TEXT
type though. The data extraction from the doc_txt
column into the entity attribute fails. The error text looks like this:
Unknown wrap conversion requested: [B to java.lang.String : `org.hibernate.type.descriptor.java.StringJavaType` (java.lang.String)
So, storing long text in the TEXT/VARCHAR
column brings fewer issues. No problems with transactions, LIKE
conditions, etc. The only downside is storage size (up to 1Gb). Are there any more caveats with it?
There might be a problem if we use the TEXT
column type in the database and the @Lob
annotation. Let’s see how it works. First, let’s create a table document
and insert some data into it:
create table document (
id int8 generated by default as identity,
date_created timestamp not null,
doc_txt text,
primary key (id)
);
insert into document (id, date_created, doc_txt) values (1, '2021-10-10', 'This is the document text number 1');
We will use the document
entity definition with the @Lob
column:
@Entity
@Table(name = "document")
public class Document {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "id", nullable = false)
private Long id;
@Column(name = "date_created", nullable = false)
private LocalDateTime dateCreated;
@Lob
@Column(name = "doc_txt")
private String docText;
//Getters and setters omitted for brevity
}
The code for document fetching will be the same:
documentRepository.findById(1L).ifPresent(d -> System.out.println(d.getDocText()));
If we try to execute the repository method, we will see the following:
java.lang.IllegalStateException: Failed to execute Application
…
Caused by: org.hibernate.exception.DataException: could not execute query
…
Caused by: org.postgresql.util.PSQLException: Bad value for type long: This is the document text number 1
…
As we can see, Hibernate handles @Lob
attribute values as a reference to LOB object data. The TEXT
column type in the database table does not affect this behavior.
And what about saving data? Let’s clean the table, try to save the document entity with the @Lob
field, and fetch it using Spring Data JPA. Here is the code to do this:
//Saving
Document doc = new Document();
doc.setDateCreated(LocalDateTime.now());
doc.setDocText("This is another text document");
documentRepository.save(doc);
...
//Fetching
documentRepository.findAll().forEach(d -> System.out.println(d.getDocText));
...
//Result
This is another text document
So, it looks as if our entity with the @Lob
attribute can work with TEXT
columns. In the database tables we will see the familiar picture:
id | date_created | doc_txt |
1 | 2022-06-16 15:28:26.751041 | 76388 |
loid | pageno | data |
76388 | 0 | This is another text document |
If we insert document data into the table using SQL and then select the data, we will get the following:
insert into document (date_created, doc_txt) values ('2021-10-10', 'This is the document text');
select * from document;
id | date_created | doc_txt |
1 | 2022-06-16 15:28:26.751041 | 76388 |
2 | 2021-10-10 00:00:00 | This is the document text |
Now we won’t be able to select data from the database using Spring Data JPA. The application will crash with the type conversion error while selecting the second row.
Let’s add the @Type
annotation to the attribute...
@Lob
@Type(type = "org.hibernate.type.TextType")
@Column(name = "doc_txt")
private String docText;
...and try to print documents' text data to the app console.
documentRepository.findAll().forEach(d -> System.out.println(d.getDocText));
We’ll see the following:
Hibernate: select document0_.id as id1_0_, document0_.date_created as date_cre2_0_, document0_.text as text3_0_ from document document0_
76388
This is the document text
With the @Type
annotation, we can select data, but the OID reference is translated into text, so we “lose” the text stored in the LOB storage.
Conclusion: Storing Long Text in Table
So what are the pros and cons of storing long text as a text column in the database:
Pro:
- Queries work as expected; no need for separate transactions or native queries
Con:
- Storage size is limited to 1Gb
- Mixing the
@Lob
attribute definition andTEXT
column datatype may cause unexpected results.
Final Words: How to Store Long Text in PostgreSQL
- For most cases, storing long text data in the same table along with other entity data should work fine. It will allow you to manipulate data using both Hibernate and direct SQL.
- In Hibernate 5, use
@Type(type = "org.hibernate.type.TextType")
annotation for the JPA entity attribute. - If you use Hibernate 6, prefer
@Column(name = ..., length = Length.LOB_DEFAULT)
annotation for the column definition. - Note that we cannot store more than 1Gb of text when using this approach.
- In Hibernate 5, use
- If you plan to store significant amounts of character data (more than 1Gb), use the
@Lob
annotation for the JPA entity attribute. Hibernate will use PostgreSQL’s dedicated storage optimized for large amounts of data. There are several things that we should consider when we use LOBs.- We must execute JPQL queries and Spring Data JPA query methods in one transaction or disable auto-commit mode explicitly.
- To use the LOB column in the
WHERE
condition, we might need to use native queries.
- There is great advice in Hibernate documentation: Please don’t (ab)use JPA’s @Lob annotation just because you want a TEXT column. The purpose of the @Lob annotation is not to control DDL generation! Hence do not use the
@Lob
entity attribute definition with theTEXT
column datatype.
Hope those simple rules will help you avoid issues while storing text data in PostgreSQL with Hibernate.
Opinions expressed by DZone contributors are their own.
Comments