Surprises From NULL in MySQL
`NULL` is one of the things in relational databases that will never disappoint you with the number of surprises it can bring.
Join the DZone community and get the full member experience.
Join For FreeThe `NULL` value can be surprising until you get used to it. And after you get used to it. And it will forever be surprising. `NULL` is one of the things in relational databases that will never disappoint you with the number of surprises it can bring. Most of them are not pleasant, but it's easy to understand them only thinking about NULL as an absence of value.
History of NULL
NULL was introduced to relational databases as a way to describe missing or inapplicable information. It was a way to say, "I don't know" or "I don't care" or "I don't have this information". And It is a very good way to do so. Let's take a look at a simple example.
| id | name | amount_of_money |
| --- | ---- | --------------- |
| 1 | John | 1000.00 |
| 2 | Jane | NULL |
| 3 | Jack | 2000.00 |
| 4 | Jill | -1.00 |
In this table, we have a list of people and their amount of money. We can see that Jane has `NULL` in the amount of money column. It does not mean that Jane has no money. We simply don't know how much money she has. Sometimes developers use -1 to indicate the absence of value, but it will not work for our case. A negative amount of money is a valid value, and it means a person has debt.
NULL on Data Storage Level
NULLs are often criticized for taking up space in the database. It is true that NULLs take up space, but it is not as bad as it seems. And, of course, they should take some space because information about the absence of information is information itself. Most relational databases add a single bit for each field that can be null. But, actually, it's not that simple. Different storage engines can solve this problem in different ways. For example, NDB storage engine reserves 4 bytes per row if the table definition contains any columns allowing NULL, up to 32 NULL columns. So in case, the table has from 1 to 32 nullable columns, it will take 4 bytes per row no matter what. Considering all of that, it's even possible to save some storage space by using NULLs.
NULL in Indexes
It is allowed to have multiple NULLs in the unique index. Let's take a look at the following example.
CREATE TABLE users (
id INT(11) NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) DEFAULT NULL,
amount_of_money INT(11) DEFAULT NULL,
PRIMARY KEY (id),
UNIQUE KEY email (email)
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8mb4
COLLATE = utf8mb4_unicode_ci;
INSERT
INTO
users (name, email, amount_of_money)
VALUES
('John', 'John@example.com', 10);
INSERT
INTO
users (name)
VALUES
('Jane');
INSERT
INTO
users (name, amount_of_money)
VALUES
('Ben', -5);
SELECT *
FROM
users;
+----+------+------------------+-----------------+
| id | name | email | amount_of_money |
+----+------+------------------+-----------------+
| 1 | John | John@example.com | 10 |
| 2 | Jane | NULL | NULL |
| 3 | Ben | NULL | -5 |
+----+------+------------------+-----------------+
3 rows in set (0.00 sec)
As you can see, we have two users with the same email. It is allowed because NULL is not equal to NULL. It is a special case. NULL is not equal to anything, including NULL. So, we can have multiple NULLs in the unique index. But, of course, we can't have multiple NULLs in the primary key. It is not allowed because the primary key is used to identify a row. And if we have two rows with the same primary key, we won't be able to identify them. So, we can't have multiple NULLs in the primary key. But we can have multiple NULLs in the unique index. It is allowed because the unique index is not used to identify a row. It is used to find a row by a specific value. And if we have two rows with the same value in the unique index, we can still find them by the primary key.
Compare With NULL
Let's make some comparisons with NULL.
SELECT NULL = NULL, NULL != NULL, NULL > NULL, NULL < NULL;
+-------------+--------------+-------------+-------------+
| NULL = NULL | NULL != NULL | NULL > NULL | NULL < NULL |
+-------------+--------------+-------------+-------------+
| NULL | NULL | NULL | NULL |
+-------------+--------------+-------------+-------------+
SELECT NULL IS NULL, NULL IS NOT NULL;
+--------------+------------------+
| NULL IS NULL | NULL IS NOT NULL |
+--------------+------------------+
| 1 | 0 |
+--------------+------------------+
NULL is not equal to anything, even to NULL. It's easy to understand, but it can bring some surprises. For example, let's select users from our users' table where email is not equal to.
SELECT *
FROM
users
WHERE
email <> 'John@example.com';
And the result is
Empty set (0.00 sec)
Even though we have a couple of users with `NULL` emails, they are not selected. Usually, this is not what we want. But this is how MySQL works.
`select null <> 'John@example.com'` will return `null`.
To make it work as we want, we can use `IS NULL` operator or `<=>` NULL-safe equal to operator.
SELECT *
FROM
users
WHERE
email <> 'John@example.com'
OR email IS NULL;
SELECT *
FROM
users
WHERE
NOT email <=> 'John@example.com';
+----+------+-------+
| id | name | email |
+----+------+-------+
| 2 | Jane | NULL |
| 3 | Ben | NULL |
+----+------+-------+
NULL and Arithmetic Operations
Math with NULLs is not interesting. No matter what you do, the result will be NULL.
Addition and subtraction.
SELECT
NULL + 10,
10 + NULL,
NULL - 10,
10 - NULL;
+-----------+-----------+-----------+-----------+
| NULL + 10 | 10 + NULL | NULL - 10 | 10 - NULL |
+-----------+-----------+-----------+-----------+
| NULL | NULL | NULL | NULL |
+-----------+-----------+-----------+-----------+
Division and multiplication.
SELECT
10 * NULL,
NULL * 10,
NULL / 10,
10 / NULL,
0 / NULL,
NULL / 0;
+-----------+-----------+-----------+-----------+----------+----------+
| 10 * NULL | NULL * 10 | NULL / 10 | 10 / NULL | 0 / NULL | NULL / 0 |
+-----------+-----------+-----------+-----------+----------+----------+
| NULL | NULL | NULL | NULL | NULL | NULL |
+-----------+-----------+-----------+-----------+----------+----------+
Module and power.
SELECT
NULL % 10,
10 % NULL,
POWER(10, NULL),
POWER(NULL, 10);
+-----------+-----------+-----------------+-----------------+
| NULL % 10 | 10 % NULL | POWER(10, NULL) | POWER(NULL, 10) |
+-----------+-----------+-----------------+-----------------+
| NULL | NULL | NULL | NULL |
+-----------+-----------+-----------------+-----------------+
Group By
Let's take a look at the following example.
SELECT email FROM users GROUP BY email;
SELECT DISTINCT email FROM users;
Both queries return the same result.
+------------------+
| email |
+------------------+
| NULL |
| John@example.com |
+------------------+
And even though NULL is equal no nothing, here it is equal to itself. I'd say this is an exception from the rules and allows us to group by NULL.
NULL and Boolean Operators
With boolean operators, NULL behaves like an absence of value. So in every case with uncertainty, the result will be null.
OR
SELECT
NULL OR TRUE,
TRUE OR NULL,
NULL OR FALSE,
FALSE OR NULL;
+--------------+--------------+---------------+---------------+
| NULL OR TRUE | TRUE OR NULL | NULL OR FALSE | FALSE OR NULL |
+--------------+--------------+---------------+---------------+
| 1 | 1 | NULL | NULL |
+--------------+--------------+---------------+---------------+
AND
SELECT
NULL AND TRUE,
TRUE AND NULL,
NULL AND FALSE,
FALSE AND NULL;
+---------------+---------------+----------------+----------------+
| NULL AND TRUE | TRUE AND NULL | NULL AND FALSE | FALSE AND NULL |
+---------------+---------------+----------------+----------------+
| NULL | NULL | 0 | 0 |
+---------------+---------------+----------------+----------------+
NOT
SELECT
NOT NULL;
+----------+
| NOT NULL |
+----------+
| NULL |
+----------+
IS UNKNOWN
SELECT NULL IS UNKNOWN, NULL IS NOT UNKNOWN;
+-----------------+---------------------+
| NULL IS UNKNOWN | NULL IS NOT UNKNOWN |
+-----------------+---------------------+
| 1 | 0 |
+-----------------+---------------------+
IS TRUE, IS FALSE
SELECT
NULL IS TRUE,
NULL IS NOT TRUE,
NULL IS FALSE,
NULL IS NOT FALSE;
+--------------+------------------+---------------+-------------------+
| NULL IS TRUE | NULL IS NOT TRUE | NULL IS FALSE | NULL IS NOT FALSE |
+--------------+------------------+---------------+-------------------+
| 0 | 1 | 0 | 1 |
+--------------+------------------+---------------+-------------------+
NULL and String Functions
Let's take a look at some string functions.
Concatenation and group concatenation will behave differently with null values.
SELECT CONCAT('hello', NULL, 'world');
+--------------------------------+
| concat('hello', null, 'world') |
+--------------------------------+
| NULL |
+--------------------------------+
SELECT
GROUP_CONCAT(email)
FROM
users;
+---------------------+
| GROUP_CONCAT(email) |
+---------------------+
| John@example.com |
+---------------------+
`CONCAT_WS` will simply ignore NULL values.
SELECT CONCAT_WS(',', 'Hello', NULL, ' world');
+-----------------------------------------+
| CONCAT_WS(',', 'Hello', NULL, ' world') |
+-----------------------------------------+
| Hello, world |
+-----------------------------------------+
SELECT CONCAT_WS(NULL, 'Hello', NULL, ' world');
+------------------------------------------+
| CONCAT_WS(NULL, 'Hello', NULL, ' world') |
+------------------------------------------+
| NULL |
+------------------------------------------+
`ELT` will return the position of the null element.
SELECT ELT(3, null, 'Bb', null, 'Dd');
+--------------------------------+
| ELT(3, null, 'Bb', null, 'Dd') |
+--------------------------------+
| NULL |
+--------------------------------+
`FIELD` will not find the null element.
SELECT FIELD(null, 'Aa', 'Bb', 'Cc', 'Dd', 'Ff', null);
+-------------------------------------------------+
| FIELD(null, 'Aa', 'Bb', 'Cc', 'Dd', 'Ff', null) |
+-------------------------------------------------+
| 0 |
+-------------------------------------------------+
`INSERT` will not work with null.
SELECT INSERT('Quadratic', 3, 4, null);
+---------------------------------+
| INSERT('Quadratic', 3, 4, null) |
+---------------------------------+
| NULL |
+---------------------------------+
`REPEAT` will not work with null.
SELECT REPEAT('MySQL', NULL), REPEAT(NULL, 3);
+-----------------------+----------------------------------+
| REPEAT('MySQL', NULL) | REPEAT(NULL, 3) |
+-----------------------+----------------------------------+
| NULL | NULL |
+-----------------------+----------------------------------+
NULL and Aggregate Functions
`COUNT` will ignore null values.
SELECT
COUNT(email),
COUNT(*)
FROM
users;
+--------------+----------+
| count(email) | count(*) |
+--------------+----------+
| 1 | 3 |
+--------------+----------+
`COUNT(DISTINCT)` will behave in the same way even though `SELECT DISTINCT` will return null values.
SELECT
COUNT(DISTINCT email)
FROM
users;
+-----------------------+
| COUNT(DISTINCT email) |
+-----------------------+
| 1 |
+-----------------------+
Same story with `AVG`. It will ignore null values.
SELECT
AVG(amount_of_money)
FROM
users;
+----------------------+
| AVG(amount_of_money) |
+----------------------+
| 2.5000 |
+----------------------+
But depending on your needs, you can use `COALESCE` to replace null values with something else.
SELECT
AVG(COALESCE(amount_of_money, 0))
FROM
users;
+-----------------------------------+
| AVG(COALESCE(amount_of_money, 0)) |
+-----------------------------------+
| 1.6667 |
+-----------------------------------+
Conclusion
NULLs in the relational topic in relational databases. If you think you know a lot about it, be careful. That rabbit hole is deep. I hope this article will help you to understand NULLs better. Thanks for reading.
Opinions expressed by DZone contributors are their own.
Comments