SQL Looks Like English Is a Well-Intentioned Error
SQL is a language with very strict syntax, and any non-compliant part of a statement will be rejected by the database. Learn more about its advantages here.
Join the DZone community and get the full member experience.
Join For FreeWe know that SQL is very similar to English, and simple SQL statements can be read directly as English. Except for SQL, other major programming languages do not have this feature, and even if there are English words in the syntax, they are only used as mnemonics for certain concepts or operations, and what is written is a formal program statement rather than an English sentence. But SQL is different. It will write the entire sentence in a form that conforms to English habits, and also add many unnecessary prepositions, such as FROM being the operation subject of the statement but having to be written in the behind, and after GROUP, an extra BY needs to be written.
Why is this happening? The easy reason to think of is to hope that non-programmers can also use it. Users can write SQL to query data as long as they can read and write English. This is clearly a good intention, but the result is not satisfactory. The vast majority of business personnel only know how to write very simple queries using SQL, and for such queries, there is now powerful BI software that can provide a more convenient and intuitive visual interface to assist, without the need for handwritten statements. This design intention loses its meaning. On the contrary, the vast majority of users that frequently use SQL for calculations are still programmers, and SQL is still a programming language, like or unlike English, it does not have much difference in understanding for programmers, on the contrary, it can bring considerable difficulties.
In fact, SQL is a language with very strict syntax, and any non-compliant part of a statement will be rejected by the database. Users must study and follow its syntax rules carefully, which is no different from other programming languages. The true advantage of natural language lies in its ambiguity, which allows it to accept less strict syntax to a certain extent. However, SQL does not support this feature, and it was not possible to achieve this feature in the era when SQL was invented.
The benefits of looking like English cannot be reflected, but there are many drawbacks. Designing grammar like natural language may seem easy to master, but in fact, the opposite is true.
The main downside of being like natural language is non-procedural. Program logic is generally executed step by step, using variables to record intermediate results for use in subsequent steps. But natural language is not like this, the citation relationship between two sentences is dependent on a small number of fixed pronouns, which is imprecise and inconvenient. Therefore, it will have to spell actions targeting the same subject into one sentence as much as possible, so there is no need to use pronouns. The corresponding representation in SQL is to have multiple actions in one statement, such as SELECT, WHERE, and GROUP, which were originally unrelated actions. In other programming languages, they are usually designed as multiple functions, but in SQL, they are all designed as clauses of one statement. Moreover, words like “WHERE” and “HAVING” have the same meaning, only targeting different objects, when spelled into one sentence, two words must be used to indicate the difference, which is confusing (many beginners may be confused about HAVING).
Complex situations that cannot be described in a single sentence can be described using clauses in natural language. This is manifested as subqueries in SQL, and there may also be multiple layers of nested subqueries, which is not common in other programming languages. Moreover, subqueries should also be like natural language, with a SELECT…FROM every time, it will make people feel very verbose and the code will become long.
Step-by-step is an effective way to reduce the difficulty of understanding and executing: something that is originally quite simple to do in a few steps, but if you don’t take them step by step, it will be very complicated to implement. It can be imagined that if the teacher asked elementary school pupils to solve practical problems only in one equation, the children would be very distressed (of course, some smart children can handle it).
For example, if we want to find customers whose sales exceed twice the average, the natural way of thinking is to first calculate the average sales, and then find customers whose sales exceed twice this value, implementing with two statements. The writing of SQL requires using subqueries to write one longer sentence. This example is relatively easy to understand, with only two layers. The difficulty of using natural language clauses to describe the relationship within two layers is generally acceptable, but in reality, complex queries involving three or five layers are common, which seriously increases the difficulty of understanding.
Not advocating for step-by-step can lead to a long single SQL statement. The complex SQL statements faced by programmers are rarely counted in lines, often in Ks. However, for the same 100 lines of code, whether it is divided into 100 statements or only 1 statement, its complexity is not at the same level at all. This type of code is very difficult to understand, and once it was finally written, after two months, the programmer himself couldn’t understand it. Moreover, the single long statement without steps is difficult to debug, and the development cycle is also longer.
About procedurality: there is a saying in the industry that SQL is a declarative language, and users only need to care about what they want without caring about how to do it; the database will automatically find a solution, and this language does not need to support procedurality. We have already criticized this statement earlier.
Database vendors may have also seen the lack of procedurality of SQL, so they later added CTE syntax to compensate, which is equivalent to providing intermediate variables that can be named. Stored procedures are also equivalent to being able to execute SQL in steps, with branching loops and even subroutines. The result is still to return to the old path of procedural language, so this is not as good as designing it like this from the beginning.
For programming languages, the ease of use brought by a good step-by-step computing mechanism far exceeds what looks like natural language.
Published at DZone with permission of Judy Liu. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments