PostgreSQL: Bulk Loading Data With Node.js and Sequelize
Application development often requires seeding data in a database for testing and development. The following article will outline how to handle this using Node.js and Sequelize.
Join the DZone community and get the full member experience.
Join For FreeWhether you're building an application from scratch with zero users, or adding features to an existing application, working with data during development is a necessity. This can take different forms, from mock data APIs reading data files in development, to seeded database deployments closely mirroring an expected production environment.
I prefer the latter as I find fewer deviations from my production toolset leads to fewer bugs.
A Humble Beginning
For the sake of this discussion, let's assume we're building an online learning platform offering various coding courses. In its simplest form, our Node.js API layer might look like this.
// server.js
const express = require("express");
const App = express();
const courses = [
{title: "CSS Fundamentals", "thumbnail": "https://fake-url.com/css"}],
{title: "JavaScript Basics", "thumbnail": "https://fake-url.com/js-basics"}],
{title: "Intermediate JavaScript", "thumbnail": "https://fake-url.com/intermediate-js"}
];
App.get("/courses", (req, res) => {
res.json({data: courses});
});
App.listen(3000);
If all you need is a few items to start building your UI, this is enough to get going. Making a call to our /courses
endpoint will return all of the courses defined in this file. However, what if we want to begin testing with a dataset more representative of a full-fledged database-backed application?
Working With JSON
Suppose we inherited a script exporting a JSON-array containing thousands of courses. We could import the data, like so.
// courses.js
module.exports = [
{title: "CSS Fundamentals", "thumbnail": "https://fake-url.com/css"}],
{title: "JavaScript Basics", "thumbnail": "https://fake-url.com/js-basics"}],
{title: "Intermediate JavaScript", "thumbnail": "https://fake-url.com/intermediate-js"},
...
];
// server.js
...
const courses = require("/path/to/courses.js");
...
This eliminates the need to define our mock data within our server file, and now we have plenty of data to work with. We could enhance our endpoint by adding parameters to paginate the results and set limits on how many records are returned. But, what about allowing users to post their own courses? How about editing courses?
This solution gets out of hand quickly as you begin to add functionality. We'll have to write additional code to simulate the features of a relational database. After all, databases were created to store data. So, let's do that.
Bulk Loading JSON With Sequelize
For an application of this nature, PostgreSQL is an appropriate database selection. We have the option of running PostgreSQL locally or connecting to a PostgreSQL-compatible cloud-native database, like YugabyteDB Managed. Apart from being a highly-performant distributed SQL database, developers using YugabyteDB benefit from a cluster that can be shared by multiple users. As the application grows, our data layer can scale out to multiple nodes and regions.
After creating a YugabyteDB Managed account and spinning up a free database cluster, we're ready to seed our database and refactor our code, using Sequelize. The Sequelize ORM allows us to model our data to create database tables and execute commands. Here's how that works.
First, we install Sequelize from our terminal.
// terminal
> npm i sequelize
Next, we use Sequelize to establish a connection to our database, create a table, and seed our table with data.
// database.js
// JSON-array of courses
const courses = require("/path/to/courses.js");
// Certificate file downloaded from YugabyteDB Managed
const cert = fs.readFileSync(CERTIFICATE_PATH).toString();
// Create a Sequelize instance with our database connection details
const Sequelize = require("sequelize");
const sequelize = new Sequelize("yugabyte", "admin", DB_PASSWORD, {
host: DB_HOST,
port: "5433",
dialect: "postgres",
dialectOptions: {
ssl: {
require: true,
rejectUnauthorized: true,
ca: cert,
},
},
pool: {
max: 5,
min: 1,
acquire: 30000,
idle: 10000,
}
});
// Defining our Course model
export const Course = sequelize.define(
"course",
{
id: {
type: DataTypes.INTEGER,
autoIncrement: true,
primaryKey: true,
},
title: {
type: DataTypes.STRING,
},
thumbnail: {
type: DataTypes.STRING,
},
}
);
async function seedDatabase() {
try {
// Verify that database connection is valid
await sequelize.authenticate();
// Create database tables based on the models we've defined
// Drops existing tables if there are any
await sequelize.sync({ force: true });
// Creates course records in bulk from our JSON-array
await Course.bulkCreate(courses);
console.log("Courses created successfully!");
} catch(e) {
console.log(`Error in seeding database with courses: ${e}`);
}
}
// Running our seeding function
seedDatabase();
By leveraging Sequelize’s bulkCreate method, we’re able to insert multiple records in one statement. This is more performant than inserting requests one at a time, like this.
. . .
// JSON-array of courses
const courses = require("/path/to/courses.js");
async function insertCourses(){
for(let i = 0; i < courses.length; i++) {
await Course.create(courses[i]);
}
}
insertCourses();
Individual inserts come with the overhead of connecting, sending requests, parsing requests, indexing, closing connections, etc. on a one-off basis. Of course, some of these concerns are mitigated by connection pooling, but generally speaking the performance benefits of inserting in bulk are immense, not to mention far more convenient. The bulkCreate method even comes with a benchmarking option to pass query execution times to your logging functions, should performance be of primary concern.
Now that our database is seeded with records, our API layer can use this Sequelize model to query the database and return courses.
// server.js
const express = require("express");
const App = express();
// Course model exported from database.js
const { Course } = require("/path/to/database.js")
App.get("/courses", async (req, res) => {
try {
const courses = await Course.findAll();
res.json({data: courses});
} catch(e) {
console.log(`Error in courses endpoint: ${e}`);
}
});
App.listen(3000);
Well, that was easy! We've moved from a static data structure to a fully-functioned database in no time.
What if we're provided the dataset in another data format, say, a CSV file exported from Microsoft Excel? How can we use it to seed our database?
Working With CSVs
There are many NPM packages to convert CSV files to JSON, but none are quite as easy to use as csvtojson. Start by installing the package.
// terminal
> npm i csvtojson
Next, we use this package to convert our CSV file to a JSON-array, which can be used by Sequelize.
// courses.csv
title,thumbnail
CSS Fundamentals,https://fake-url.com/css
JavaScript Basics,https://fake-url.com/js-basics
Intermediate JavaScript,https://fake-url.com/intermediate-js
// database.js
...
const csv = require('csvtojson');
const csvFilePath = "/path/to/courses.csv";
// JSON-array of courses from CSV
const courses = await csv().fromFile(csvFilePath);
...
await Course.bulkCreate(courses);
...
Just as with our well-formatted courses.js
file, we're able to easily convert our courses.csv
file to bulk insert records via Sequelize.
Conclusion
Developing applications with hardcoded data can only take us so far. I find that investing in tooling early in the development process sets me on the path toward bug-free coding (or so I hope!)
By bulk-loading records, we’re able to work with a representative dataset, in a representative application environment. As I’m sure many agree, that’s often a major bottleneck in the application development process.
Published at DZone with permission of Brett Hoyer. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments