Why Should Databases Go Natural?
From search to CRM, applications are adopting natural language and intuitive interactions. Should databases follow? This article provides a strategic perspective.
Join the DZone community and get the full member experience.
Join For FreeAmid the many technological evolutions in software and hardware (CISC/RISC, Internet, Cloud, and AI), one technology has endured: Relational Database Systems (RDBMS), aka SQL databases. For over 50 years, RDBMS has survived and thrived, overcoming many challenges. It has evolved and adopted beneficial features from emerging technologies like object-relational databases and now competes robustly with NoSQL databases.
Today, RDBMS dominates the market, with four of the top five databases and seven of the top ten being relational. RDBMS has smartly borrowed ideas, like JSON support, from NoSQL, while NoSQL has also borrowed from RDBMS. NoSQL no longer rejects SQL. From a user perspective, all modern databases have SQL-inspired query language and a set of APIs. All applications manage the respective data model and data via these DDLs and DMLs.
The question now stands: will SQL continue to dominate, or will SQL++ take over? What does natural language processing mean for database query language?
To answer these strategic questions, let’s visit the evolution of APIs in the database and other industries.
Communication Industry
For a technological revolution to occur, the underlying technology and the user interface (or API) must change. Every technology is limited by the difficulty of using or accessing it; making it accessible is an important dimension. This shift brings in new users who were previously non-consumers of the technology.
When the world transitioned from telegraphs to telephones, voice communication replaced Morse code, making it accessible to a broader audience. Similarly, Oracle and DB2's implementation of SQL freed programmers from the complexities of record pointer manipulation in the hierarchical databases. Oracle was the first to market with SQL and has maintained its lead for over 45 years. In every technological revolution, the company's introduction of a new, easy-to-use API supported by innovative technology has supplanted the incumbent and dominated the market. Examples include Western Union to AT&T, IBM to Oracle, and Nokia to Apple.
It's interesting that IBM scientists invented relational databases and SQL; Oracle and others benefited most from them. We need to remember the difference between "invention" and its cousin, "innovation," which brings ideas to market.
API |
Technology |
Product/Examples |
Dominant Company |
Writing |
Postal |
Government Postal Services |
Govt |
Morse code |
Telegraph |
Western Union |
Western Union |
Voice |
Telephone |
AT&T |
AT&T |
Text docs |
Internet |
|
|
Touchscreen voice, text, photo, video |
Smartphone |
Apple, Android |
Apple, Android |
Note 1: Telegraph did not replace postal, not because it improved the “interface” but because it improved the speed of communication by orders of magnitude. Before Twitter, sending effective telegrams with a few words was a learned skill.
Note 2: See the inspiring story of Morse code here.
Database Industry
The battle between NoSQL and RDBMS has centered on performance, availability, and scalability. NoSQL databases feature declarative query languages that extend SQL to varying degrees, with MongoDB's MQL mimicking many aspects of SQL. These innovations aim to address the limitations of traditional relational databases while leveraging NoSQL's strengths. Despite these advancements, SQL has proven to be unreasonably effective, and every new database, relational or not, attempts to incorporate or emulate its principles. This has given RDBMS a significant advantage. Even after 15 years of NoSQL developments, Oracle and other RDBMS still dominate, with SQL continuing to grow. While cloud databases and NoSQL solutions have drastically reduced transaction costs,
Problems and Opportunities
- Databases are still way too difficult to use. Each database operates with its own data model (both logical and physical) and data types, requiring users to meticulously design a suitable model and then convert and load data into the system. This intricate process has given rise to a substantial "data conversion" industry.
- SQL is unreasonably effective. SQL is complex compared to English. While SQL dramatically simplifies the language compared to relational calculus expressions or the data pointer manipulation of hierarchical databases, it still presents challenges. Although SQL is easier, it hasn't significantly evolved in terms of simplicity over the past 50 years. One can argue that to stay relevant to more use cases, SQL syntax and semantics have grown large and complicated.
- The administration is complex. Sizing to schema (physical), indexes, data distribution, and tuning.
A Technological Revolution
ChatGPT and the Large Language Models (LLM) represent a new paradigm in data management and data wrangling. They process all publicly available data, regardless of format, and leverage natural language Q&A as its primary interface. Users interact through both a browser-based shell (chat) and a natural language interface (API), ensuring accessibility and ease of use. For every question (query), ChatGPT model provides an answer, accurate or not, captivating users with its broad capabilities. It seamlessly processes and generates multi-modal data, showcasing its versatility and potential to redefine how we interact with and utilize data.
It's time to rethink the data model and the user API for databases to meet the evolving needs of humans, developers, and AI copilots. Each new generation of databases has introduced distinct data models and interfaces through new languages or APIs. Today, databases can handle any data if it's converted to JSON, which limits their reach. Imagine bringing the ease of use of ChatGPT into the enterprise and everyday use cases. This shift necessitates a new data model and query approach, which we call the Natural Data Model and Natural Queries.
By leveraging natural language interactions, we can simplify complex data tasks, making data management more intuitive and accessible for users at all levels. This innovative approach promises to streamline workflows, enhance productivity, and democratize data access across the enterprise, ultimately transforming how organizations harness and utilize their data assets.
Here’s the table with databases, data models, and APIs:
DB Type-> |
Network |
Hierarchical |
RDBMS |
NoSQL |
What’s Next? |
Data Model |
Network |
Hierarchical |
Relational |
JSON, BSON WideColumn Multimodal |
Natural Data: Structured, Semi-structured, Unstructured |
API |
DBTG APIs |
Custom APIs |
SQL |
MongoAPI SQL++ |
Natural Q&A and interactions for everything: appdev, analytics, admin |
Product |
IDS IDMS |
IMS |
Oracle DB2 |
MongoDB Couchbase Neo4J Cassandra |
OPEN |
Dominant Company |
GE Cullinet |
IBM |
Oracle |
MongoDB |
OPEN |
Note: NoSQL technologies have replaced relational databases in specific use cases and industries but have not summarily replaced SQL databases (RDBMS). The theory of disruptive innovation says you have to make the interface much easier to attract non-consumers of previous-generation technology to your technology.
Natural interaction with data and systems
The Natural Data Model is an innovative framework designed to accommodate and process any type of data users possess, regardless of format. Whether the data is structured, semi-structured, or unstructured data (e.g., JSON, CSV, TSV, Avro, simply text), the Natural Data Model seamlessly accesses it, transforms it if necessary, and queries it. This model prioritizes flexibility and accessibility, allowing users to effortlessly work with diverse datasets without extensive data preparation or conversion. By embracing a universal approach to data formats, the Natural Data Model ensures that users can focus on deriving insights and making data-driven decisions rather than getting bogged down by the complexities of data management. This approach fosters a more intuitive and efficient interaction with data, empowering users to unlock the full potential of their information assets.
Natural Queries represent a revolutionary approach to interacting with data, allowing users to formulate queries in a natural language like English. By enabling completely natural interactions similar to ChatGPT's Q&A format, users can ask questions and receive answers without mastering complex query languages. This will be for both analysis and manipulation. This system also supports business-specific lingo, adapting to the unique terminologies and requirements of different industries and companies. Natural Queries can be used in interactive applications, offering more predictable and structured interactions that closely align with traditional SQL but with the simplicity and ease of natural language. This approach makes data querying more intuitive and accessible.
Yes, there has been a huge interest in adding layers on top of databases to convert natural language questions into SQL. That's a start, not the end state.
Natural answers (or results) should follow the principles championed by Edward Tufte: they should be presented in a form that directly addresses the question and facilitates clear understanding and analysis. These answers might take the form of structured data, text, charts, images, or any combination of these. It's similar to how you approach a school exam — where the question doesn't dictate the format of your response. You provide whatever is necessary to fully answer and explain the question, ensuring clarity and comprehension.
The technical initiatives around these three ideas will make databases easier to use, lower the barrier to use databases, and expand the use of sophisticated databases.
Key Takeaways
- Many mantras are hyping up AI: "AI or Die" and "AI is going to be bigger than fire." You'll have to use strategic theories and apply them to your industries to see what's next.
- AI is going to transform all technologies, including databases and enterprise applications.
Opinions expressed by DZone contributors are their own.
Comments