Data-Based Decision-Making: Predicting the Future Using In-Database Machine Learning

Learn how a new technique in ML— open-source in-database machine learning—is being used across industries to speed and scale predictions and drive innovation.

Jorge Torres

Updated Dec. 13, 22 · Opinion

Likes (3)

Comment

Save

35.8K Views

In just a few short years, machine learning (ML) has become an essential technology that companies deploy in almost every aspect of their business. Previously the preserve of giant institutions with deep pockets, the ML market is rapidly opening up. Every kind of business can now leverage ML to minimize repetitive manual processes, automate decision-making, and predict future trends. At almost every stage of any business task, ML is making processes smarter, streamlined, and speedier.

In recent years, technological advances have helped to democratize access and drive the adoption of ML by reducing the time, skill level, and the number of steps required to gain ML-driven predictions. So rapid has growth been that the global ML market is expected to expand from $21 billion in 2022 to $209 billion by 2029. Tools such as declarative ML and AutoML are helping enterprises to access powerful, business-critical predictive analytics. Taking these approaches one step further, open-source in-database ML is a new technique that’s gaining ground. It allows businesses to easily put questions to their data and rapidly get answers back using standard SQL queries.

What Is In-Database ML?

Building ML models has traditionally been a highly-skilled, lengthy, resource-intensive endeavor. Typical time frames for ML initiatives are measured in months. It’s not unusual for projects to take longer than six months, with considerable time devoted to the extraction, cleaning up, and preparation of data from the database.

By contrast, open-source in-database ML brings analytics into the database, enabling businesses to achieve the kind of insights you’d expect from traditional, fully customized ML models, but with some important differences. In-database ML achieves those results much faster (days or weeks, not months) because the data never needs to leave the database. Another difference is that in-database modeling is done using regular, existing database skills like SQL, making it far more accessible to the wider IT team to handle.

Although a relatively new field, it is now the fastest-growing segment in ML by GitHub star endorsements. In fact, there are now in-database ML integrations for all the major database vendors, ML frameworks, BI tools, and notebooks.

How Are Businesses Using In-Database ML?

With use cases in every domain of business from HR to marketing to sales to production, predictions derived in-database are helping companies hone their customer experience, improve product personalization, optimize customer lifetime value, increase employee retention, evaluate risk more accurately, and raise workplace productivity.

Take one example from the productivity software space: Rize, a smart time tracker that makes users more productive and efficient at work, used in-database ML to develop a powerful feature in response to user feedback in a matter of weeks. The resulting capabilities — driven by ML-generated insights — increased customer retention and conversion rates. It has also helped differentiate Rize in a highly competitive market, cementing its position as a truly intelligent time tracker.

Speed and Scale of In-Database ML Reshaping Industries

While many of these use cases benefit businesses no matter the sector or location, specific industry applications are emerging that deliver future insights in real-time, cost a fraction of traditional ML algorithms to set up, and are starting to disrupt existing value chains within these markets.

The financial sector — an industry that was quick to operationalize traditional ML modeling — is now turning to in-database modeling for improved agility. Financial services and fintech companies are using in-database ML to detect fraud, aid loan recovery, improve credit scoring, and approve loans. As a result, they’re able to react faster to market conditions, adapt the services they offer, and even open new revenue streams.

For example, Domuso, a next-generation multi-family rent payment processing platform, saved $500,000 annually using in-database ML. Domuso trained and deployed an in-database ML model to accurately predict if rental payments are likely to be returned due to insufficient funds. “With in-database ML we implemented advanced models faster and with less complexity,” said Sameer Nayyar, the then-EVP of product and operations at Domuso. “It positively impacted our business. We saw a reduction of chargebacks by $95,000 over two months and a saving of $500,000 over the first year.” Furthermore, as new use cases arise, Domuso is now able to create and implement new ML models in a matter of weeks, not months.

Sectors such as retail, FMCG, and food production have been quick to realize the real-time predictions of in-database modeling, helping them respond to market conditions as they happen with “just-in-time” and location-specific offers. Managing stock, predicting demand for specific items, optimizing staffing levels, and forecasting future pricing are just a few examples of how retail and other businesses are turning to in-database ML algorithms to address their day-to-day challenges.

Take the example of Journey Foods, a supply chain and food science software platform for food development and innovation, which used in-database ML to address the challenge of constantly shifting ingredient prices. They wanted to predict food costs for its customers in one, three, six, and 12 months' time, drawing on its database of 130,000 food ingredients across 22,000 suppliers. With ingredients and suppliers always changing, they were concerned that the predictive analytics required to map these complex, “many-to-many" relationships would be time-intensive to set up and would need continual maintenance and re-training. Journey Foods turned to in-database ML to develop its cost prediction model, resulting in high-accuracy predictions for food ingredients. It has also resulted in significantly lower operating costs than a homegrown ML model they originally considered.

Increasing Business Agility and Innovation

There are many more industry-specific examples, but the common factors driving this rapidly growing open-source movement are speed and scale. In-database ML makes sophisticated predictive analytics available to any business with a database.

For example, at a recent open-source, in-database ML hackathon, Hacktoberfest, the growing community of in-database ML programmers aptly demonstrated the potential for innovation. Over the course of the event, teams submitted over 20 new database handlers — including connections to Apache Impala and Solr, PlanetScale, and Teradata, plus over 10 new machine learning handlers — including PyCaret, Ray Server, and Salesforce.

It’s still the early days for in-database analytics. Just like the wider AI industry, the ML segment is no stranger to hype. However, with quick answers to complex problems no longer just theoretically possible, but achievable in the near term by businesses of all sizes and budgets, in-database ML deserves serious consideration. Cutting the time it takes to build models and enabling those without a data science background to run projects drastically reduces the costs associated with predictive analytics. Data-based decision-making offers businesses a viable alternative to traditional ML techniques: fully customizable predictive capabilities at speed and scale.

Analytics Data science Database IT Machine learning Predictive analytics

Opinions expressed by DZone contributors are their own.

Related

Trending