When you should and should NOT use ENUM data type
Join the DZone community and get the full member experience.
Join For FreeENUM is a new enumerated data type introduced in CUBRID 9.0. Like in all programming languages, the ENUM
type is a data type composed of a set of static, ordered values. Users can define numeric and string values for ENUM
columns.
Working with ENUM types
Creating an ENUM
column is done by specifying a static list of possible values:
CREATE TABLE person( name VARCHAR(255), gender ENUM('Male', 'Female') );
CUBRID understands the ENUM
type as an ordered set of constants which, in the above example, is a set of {NULL: NULL, 1: 'Male', 2: 'Female”}
. To assign a value to the gender
column, users may either use the index of the value ({NULL, 1, 2}
) or the actual constant literal ({NULL}
, {'Male'}
, {'Female'}
). CUBRID restricts the values that can be assigned to this column to only values from this set + NULL
. Moreover, ENUM
column is case-sensitive, i.e. it will raise an error if you try to enter 'female'
in lower case. Also, an empty string is allowed if it is defined as one of the elements of the ENUM
column. In our examples, it is not allowed.
csql> INSERT INTO person(name, gender) VALUES('Eugene', 'Male'); 1 row affected. 1 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('Anne', 2); 1 row affected. 1 command(s) successfully processed. csql> SELECT * FROM person; === <Result of SELECT Command IN Line 1> === name gender ============================================ 'Anne' 'Female' 'Eugene' 'Male' 2 rows selected.
Any attempt to insert a value outside of the defined set will result in a coercion error. In the below case, trying to insert an empty string raises an error because it is not in the set of allowed values defined in the person table.
csql> INSERT INTO person(name, gender) VALUES('John', 'N/A'); IN line 1, COLUMN 44, ERROR: before ' ); ' Cannot coerce 'N/A' TO type enum. 0 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('John', 4); IN line 1, COLUMN 45, ERROR: before ' ); ' Cannot coerce 4 TO type enum. 0 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('John', ''); IN line 1, COLUMN 44, ERROR: before ' ); ' Cannot coerce '' TO type enum. 0 command(s) successfully processed.
Why you should use the ENUM type
There are three important reasons for which you should consider using the ENUM
type:
- Reduce storage space.
- Reduce join complexity.
- Create cheap values constraints.
Storage Space
CUBRID uses only 1 byte per tuple when 255 or less ENUM
elements are defined or 2 bytes for 256~65535 elements. This is because, rather that storing the constant literal of the value, CUBRID stores the index in the ordered set of that value. For very large tables, this might prove to be a significant storage space save.
Take, for example, a table with 1,000,000,000 records which has an ENUM
column defined as ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'
). If you use a VARCHAR
type instead of the ENUM
type to store these values, the column would require anywhere between 5GB and 9GB of storage space. Using the ENUM
type, you can reduce the required space to 2 bytes per tuple, adding up to a total of 2GB.
Reduce join complexity
JOIN way
The same effect of the ENUM
type can be achieved by creating a one to many relationship on two or more tables. Considering the example above, you can store values for days of the week like this:
CREATE TABLE days_of_week( id SHORT PRIMARY KEY, name VARCHAR(9) ); CREATE TABLE opening_hours( week_day SHORT, opening_time TIME, closing_time TIME, FOREIGN KEY fk_dow (week_day) REFERENCES days_of_week(id) );
Then, when you wish to display the name of the week day, you would execute a query like:
SELECT d.name day_name, o.opening_time, o.closing_time FROM days_of_week d, opening_hours o WHERE d.id = o.week_day ORDER BY d.id; === <Result of SELECT Command IN Line 4> === day_name opening_time closing_time ================================================== 'Monday' 09:00:00 AM 06:00:00 PM 'Tuesday' 09:00:00 AM 06:00:00 PM 'Wednesday' 09:00:00 AM 06:00:00 PM ...
ENUM way
You can achieve the same effect using an ENUM
column:
CREATE TABLE opening_hours( week_day ENUM ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), opening_time TIME, closing_time TIME );
And there’s no JOIN required to select opening hours:
SELECT week_day, opening_time, closing_time FROM opening_hours ORDER BY week_day; === <Result of SELECT Command IN Line 1> === week_day opening_time closing_time ================================================== 'Monday' 09:00:00 AM 06:00:00 PM 'Tuesday' 09:00:00 AM 06:00:00 PM 'Wednesday' 09:00:00 AM 06:00:00 PM ...
This can prove to be very useful, especially if your queries join several tables.
Value constraints
ENUM
columns behave like foreign key relationships in the sense that values from an ENUM
column are restricted to the values specified in the column definition. For a short list of values, this is more efficient than creating foreign key relationships. While foreign key relationships use index scans to enforce the restriction, ENUM
columns just have to go through a list of predefined values which is faster even for small indexes.
Why/When you should NOT use the ENUM type
Even though ENUM
is a great feature, there are cases when you’d better not use it. For example:
- When
ENUM
type is not fixed - When
ENUM
type has a long list of values - When your application does not know the list of
ENUM
values ENUM
type is not reusable- Portability is a concern
When ENUM type is not fixed
If you’re not sure if the ENUM
type holds all possible values for that column, you should consider using a one to many relationship instead. The only way in which an ENUM
column can be changed to handle more values is by using an ALTER
statement. This is a very expensive operation in any RDBMS and requires administrator rights. Also, ALTER
statements are maintenance operations and should, as much as possible, be performed offline.
When ENUM type has a long list of values
ENUM
types should not be used if you cannot limit a set of possible values to a few elements.
When your application does not know the list of ENUM values
There are only two ways of getting a list of values you have defined for an ENUM
type:
parsing the output of
SHOW CREATE TABLE
statement:csql> SHOW CREATE TABLE opening_hours; === <Result of SELECT Command IN Line 1> === TABLE CREATE TABLE ============================================ 'opening_hours' 'CREATE TABLE [opening_hours] ([week_day] ENUM('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), [opening_time] TIME, [closing_time] TIME)
selecting information from CUBRID system tables:
csql> SELECT d.enumeration FROM _db_domain d, _db_attribute a WHERE a.attr_name = 'week_day' AND a.class_of.class_name = 'opening_hours' AND d IN a.domains; === <Result of SELECT Command IN Line 1> === enumeration ====================== {'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'}
Both might require complex coding and selecting from system tables requires administrator privileges.
ENUM type is not reusable
If you have several tables which require the names of week days, you will have to create an ENUM
type for each of them. If you create a table to hold week days names, you can join this table with whichever other table that requires this information.
Portability is a concern
The ENUM
type is only supported by a few RDBMSs and each one has its own idea as to how ENUM
type is supposed to work. Below is a list of a few notable differences between CUBRID, MySQL and PostgreSQL:
CUBRID | PostgreSQL | MySQL | |
---|---|---|---|
Inserting out of range value | Throws error | Throws error | Inserts special value index 0 |
Comparing to char literals | Compare as strings | Compare as ENUM elements |
Compare as strings |
Comparing to values outside of the ENUM domain |
Compare as strings | Throws error | Compare as strings |
These subtle differences will most probably break your application in interesting and hard to understand ways. If you’re migrating from PostgreSQL to CUBRID for example, and you expect comparisons with char literals to be performed as ENUM
comparisons, you’ll have a hard time understanding why your query returns weird results.
Published at DZone with permission of Esen Sagynov, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments