Getting Started with Scala Parser Combinators
How to get started using the parser combinator library in Scala, which can be used to make your own programming language.
Join the DZone community and get the full member experience.
Join For FreeScala provides a very easy way to design your own programming language, using its parser library. This makes creating your own domain specific language (i.e. DSL) or interpreted language easier than you could ever imagine. As a primer, let's write a parser that parses simple mathematical expressions, such as "1+9*8" and "4*6/2-5".
For those of you who are familiar with language design, the EBNF grammar for this language would look something like this:
digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
number ::= digit | digit number
operator ::= "+" | "-" | "*" | "/"
expr ::= number (operator expr)?
To start writing a parser with the Scala parsing library, we write a class that extends the Parsers trait. Here's an example of a class that extends RegexParsers, which is a subtrait of Parsers.
class ExprParser extends RegexParsers {
val number = "[1-9][0-9]+".r
def expr: Parser[Any] = number ~ opt(operator ~ expr )
def operator: Parser[Any] = "+" | "-" | "*" | "/"
}
The only differences between the Scala definition of the valid tokens and the definition within the EBNF grammar are the following:
- Scala utilizes a "~" between each token
- Instead of using a "?" like you would in an EBNF grammar, Scala uses the keyword "opt"
To execute our parser, we simply invoke the inherited parse method that's part of the Parsers trait.
def main(args : Array[String]) {
val parser = new ExprParser
val result = parser.parseAll(parser.expr, "9*8+21/7")
println(result.get)
}
The result of this println will be:
(9~Some((*~(8~Some((+~(21~Some((/~(7~None))))))))))
We're done! Well, not really. The output right now is the way that Scala sees the result of our parser operations. To make our language more meaningful, let's add some Scala code to compute the arithmetic operation and print the result to the output.
Let's begin our quest to calculate the result by examining what "(9~Some((*~(8~Some((+~(21~Some((/~(7~None))))))))))" really means in the world of Scala. Let's look at a subset of this string, "(9~Some((*~(8~None))))". This is the result of parsing "9*8". The first part that looks interesting is the "9~Some(...)". In our parser, we defined the following rule:
def expr: Parser[Any] = number ~ opt(operator ~ expr)
It's clear that "number" is evaluating to "9" and "~" is being printed out verbatim, which you should recall is used in Scala parsers to join parts of the grammar. However, what's going on with "Some(...)"? Well, whenever Scala parses an opt(x) statement, it will evaluate it as either Some(...) or None, both of which are subclasses of Option. That makes sense... the opt(x) statement evaluates to an Option.
Instead of having our parser return a bunch of ~ and options, let's look at transforming the parser results into something more useful. Again, looking at our current parser rule:
def expr: Parser[Any] = number ~ opt(operator ~ expr)
We need to modify this parser definition to have it return an Int instead of Any. We also need to compute the result of the arithmetic operation. Our grammar rule allows for either a single number or a number followed by an arithmetic operator and another number. If we're dealing with a single number, we need to tell the parser to convert the result to an Int. To do this, we make the following modification to our parser rule:
def expr: Parser[Int] = (number ^^ { _.toInt }) { }
The ^^ just tells the parser to execute the code that follows it, contained in {...}. All we're doing is converting it to an Int.
Next, we need to tell the parser what to do when it encounters a number, or when it encounters a number followed by an operator and another number. For this, we need to define the integer operation for each situation (single integer value, addition of two values, subtraction of two values, division of two values, and multiplication of two values).
def expr: Parser[Int] = (number ^^ { _.toInt }) ~ opt(operator ~ expr ) ^^ {
case a ~ None => a
case a ~ Some("*" ~ b) => a * b
case a ~ Some("/" ~ b) => a / b
case a ~ Some("+" ~ b) => a + b
case a ~ Some("-" ~ b) => a - b
}
There are five cases we're handling. The first is the situation where we have just a single integer (a ~ None). When we have an Int with None after it, we simply evaluate the integer value as-is. The second situation is when we have an integer being multiplied by another integer (a ~ Some("*" ~ b)). In this case, we simply perform a * b. We then proceed to define the rules for division, addition, and subtraction.
The key take-aways from this tutorial are:
- You define the type that your parser rule is returning within the brackets of the Parser[ ] definition. In this example, it's an Int.
- You can add custom Scala code to operate on the parser results with ^^ { ... }
Now that we've laid the groundwork for Scala parser combinators, we can build on these features to create a full-featured interpreted language that contains if-else conditions, loops, and even function calls.
Here's an article on how to create a full-featured interpreted language with this approach: https://dzone.com/articles/create-a-programming-language-with-scala-parser-co
Opinions expressed by DZone contributors are their own.
Comments