Sample Data Generation With Built-In Database Capabilities
Learn how to use special functions, hierarchical queries, and common table expressions to generate sample data in PostgreSQL, Oracle, MySQL, and SQL Server.
Join the DZone community and get the full member experience.
Join For FreeThere are many ways to generate dummy data for a database. You can create a data generator, use a mock data service, or get a subset of your production data. Additionally, many databases offer built-in capabilities for producing synthetic data.
In this article, you’ll learn how to use special database functions, hierarchical queries, and recursive Common Table Expressions (CTEs) to generate a sample dataset in PostgreSQL, MySQL, Oracle, and SQL Server.
Sample Table
We'll use the following sample table to demonstrate various built-in data generation techniques:
create table sample(
id int,
val text
);
The id
column stores the record's identifier, while the val
column holds some text value.
Once you learn how to generate mock data for the table with two columns, you'll be able to apply the techniques to tables with many more columns of various data types.
Generate Series Function in PostgreSQL
PostgreSQL offers a special generate_series function that can produce a set of numbers or date/time values for a specified range.
For instance, to generate a series of IDs from 1 to 5, you would use the following function call:
select id from generate_series(1, 5) as id;
+--+
|id|
+--+
|5 |
|4 |
|3 |
|2 |
|1 |
+--+
Afterward, you can insert the generated series into the sample
table with this simple statement:
insert into sample(id)
select id from generate_series(1, 5) as id;
This query adds five records to the table, each with a unique identifier. However, the val
column will still be null
:
select * from sample order by id;
+--+----+
|id|val |
+--+----+
|1 |null|
|2 |null|
|3 |null|
|4 |null|
|5 |null|
+--+----+
To populate the val
column with non-null values, you simply need to modify the data generation query as follows:
-- Delete previously generated records
delete from sample;
-- Generate 1000 records using the concat function to populate the `val` column
insert into sample(id,val)
select id, concat('val', id * 10) from generate_series(1, 1000) as id;
Finally, retrieve the first five records from the table to review the sample data:
select * from sample order by id limit 5;
+--+-----+
|id|val |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+
Quick and simple. For more advanced data generation scenarios using the generate_series
function, refer to this article.
Generate Series Again ... But in SQL Server
The latest versions of SQL Server have introduced support for the generate_series function. Consequently, whether you're using PostgreSQL or SQL Server, the experience of generating sample data is consistent.
To generate a series of numbers from 1 to 5, you would invoke the function as follows:
select value from generate_series(1,5);
+-----+
|value|
+-----+
|1 |
|2 |
|3 |
|4 |
|5 |
+-----+
Subsequently, execute the following command to create 1000 records in the sample
table:
insert into sample(id,val)
select value,concat('val', value * 10) from generate_series(1,1000);
To check the generated data, query the first five records:
select * from sample order by id
offset 0 rows
fetch next 5 rows only;
+--+-----+
|id|val |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+
As demonstrated, the generate_series
function is a powerful tool for data generation tasks. However, this function is not part of the SQL standard yet and may not be available in all database systems.
Hierarchical Queries in Oracle
Oracle is among the databases that do not support the generate_series
function. However, the database community has devised many alternative methods for generating dummy data.
One popular method involves using hierarchical queries. For instance, the following hierarchical query can generate a series of records from 1 to 5:
select level from dual connect by level <= 5;
+-----+
|LEVEL|
+-----+
|1 |
|2 |
|3 |
|4 |
|5 |
+-----+
Internally, the query constructs a data tree structure, with the LEVEL
pseudo-column indicating the depth of the tree, beginning with one for the root.
By using this hierarchical query with an insert statement, you can produce 1000 records for the sample
table:
-- Oracle doesn't support the `text` data type,
-- requiring you to create the table this way
create table sample (id int, val varchar(10));
-- Generate 1000 records
insert into sample(id,val)
select level, concat('val', level * 10) from dual
connect by level <= 1000;
The generated data will resemble what you've seen with PostgreSQL and SQL Server:
select * from sample order by id
offset 0 rows fetch next 5 rows only;
+--+-----+
|ID|VAL |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+
Recursive Common Table Expressions in MySQL
Like Oracle, MySQL does not support the generate_series
function, which necessitates finding alternative methods for generating sample data.
One such method is using recursive common table expressions (CTEs). To illustrate, the following recursive query generates a series of numbers from 1 to 5:
with recursive seq as (
select 1 as id union all select id +1 from seq where id < 5
)
select id from seq;
+--+
|id|
+--+
|1 |
|2 |
|3 |
|4 |
|5 |
+--+
Next, you can use this recursion together with the following insert statement to produce 1000 records:
insert into sample(id,val)
with recursive seq as (
select 1 as id union all select id +1 from seq where id < 1000
)
select id, concat('val', id * 10) from seq;
Finally, to ensure the mock data has been properly generated, take a quick look at the first five records:
select * from sample order by id limit 5;
+--+-----+
|id|val |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+
Keep Mastering Databases
As you can see, relational databases are much more than just storage for your application data. They provide a broad set of capabilities, allowing you to perform various tasks right on the database end. One such task is sample data generation, which can sometimes be satisfied with built-in database features.
Stay tuned for more database-related content, and keep mastering your database skills.
In the meantime, you can watch this video to see how the discussed data generation techniques work in practice for much larger tables.
Opinions expressed by DZone contributors are their own.
Comments