Custom Grammar to Query JSON With Antlr
Want to learn how to create a custom grammar to query JSON? Check out this tutorial to learn how write queries with Antlr!
Join the DZone community and get the full member experience.
Join For FreeAntlr is a powerful tool that can be used to create formal languages. Vital to the formalization of a language are symbols and rules, also known as grammar. Defining custom grammar and generating the associated parsers and lexers is a straightforward process with Antlr. Antlr’s runtime enables tokenization of a given character stream and parsing of those tokens. It provides mechanisms to walk through the generated parse tree and apply custom logic. Let’s take this tool for a spin and create a custom grammar to query JSON. Our end goal is to be able to write queries like the one shown below:
bpi.current.code eq "USD" and bpi.current.rate gt 650.60
To create a new grammar, one has to define the rules of the grammar. Let’s do that by creating a file named “JsonQuery.g4." We can then start composing the grammar rules that will allow us to query JSON. Here’s the snippet:
grammar JsonQuery;
query
: SP? '(' query ')' #parenExp
| query SP LOGICAL_OPERATOR SP query #logicalExp
| attrPath SP 'pr' #presentExp
| attrPath SP op=( 'eq' | 'ne' ) SP value #compareExp
;
LOGICAL_OPERATOR
: 'and' | 'or'
;
EQ : 'eq' ;
NE : 'ne' ;
attrPath
: ATTRNAME subAttr?
;
subAttr
: '.' attrPath
;
ATTRNAME
: ALPHA ATTR_NAME_CHAR* ;
fragment ATTR_NAME_CHAR
: '-' | '_' | ':' | DIGIT | ALPHA
;
You can browse the complete set of rules here.
Antlr mandates that we follow certain conventions while creating grammars. For starters, the file should contain a header, and the header name should match the filename holding the grammar. Antlr recognizes two types of rules — parser rules and lexer rules. Parser rules have to start with a lowercase letter, and the lexer rules have to start with an uppercase letter. In the snippet above, “query” is a parser rule and “EQ” is a lexer rule. Rule alternatives, like the ones defined for the “query” parser rule, can be labeled by using the “#” operator (eg: “#parenExp”). Labeling alternatives will trigger more precise events while we walk the parse tree. As I mentioned before, Antlr is extremely versatile and provides a plethora of features from defining rules, generating parsers, lexers, listeners, and visitors to non-greedy sub-rules and ways to handle precedence and left-recursion.
Antlr also provides IDE plugins that can be used to create and visualize a grammar. We can quickly test sample expressions against our grammar and preview the generated parse tree. Here’s a view of the generated parse tree based on the JSON query expression that we wrote earlier:
Now that we have a working grammar for querying JSON, let’s turn our attention to creating a Java program and implementing a query engine. The engine will walk the generated parse tree based on a given query expression, evaluate it against the specified JSON object, and return a boolean value to indicate if the query is a match or not. Let’s use Gradle to create our project. Here’s the relevant Gradle build file to enable the Antlr plugin and its dependencies:
plugins {
id "antlr"
}
dependencies {
antlr "org.antlr:antlr4:4.7"
}
generateGrammarSource {
arguments += ["-visitor"]
}
Note that Antlr can be configured to generate a listener class or a visitor class — two parse tree walking mechanisms. We will use the visitor mechanism to walk through the parse tree and evaluate the query expression. Antlr’s Gradle plugin will generate the source code that defines the lexer, parser, and visitor classes based on our grammar. We can simply extend the generated abstract classes and implement the relevant custom logic to evaluate a JSON query expression. Here’s a snippet from the JsonQueryEvaluator
class:
public class JsonQueryEvaluator
extends JsonQueryBaseVisitor<Boolean> {
@Override
public Boolean visitParenExp(ParenExpContext ctx) {
Boolean result = visit(ctx.filter());
return ctx.NOT() != null ? !result : result;
}
@Override
public Boolean visitLogicalExp(LogicalExpContext ctx) {
Boolean leftExp = visit(ctx.filter(0));
if (OR.equals(ctx.LOGICAL_OPERATOR().getText())) {
// Short circuit "or"
return leftExp;
} else {
return leftExp && visit(ctx.filter(1));
}
}
...
}
Notice how the visitor method names were generated based on the labels that we specified in our grammar. This gives us the ability to evaluate the various alternatives of a parser rule against a given JSON object. Had we not used labels, we would have been forced to use numerous if-else or switch statements to implement the same functionality.
Now that we have a custom evaluator, let’s create the query engine class. Its job is to stream an expression to the lexer, tokenize that stream, generate the corresponding parse tree, and then walk the parse tree to evaluate the expression against a JSON object. Here’s a snippet from the JsonQueryEngine
class:
public class JsonQueryEngine {
public boolean execute(String expression, JsonObject item) {
if (StringUtils.isNotBlank(expression)) {
CharStream stream = CharStreams
.fromString(expression.trim());
QueryLexer lexer = new QueryLexer(stream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
QueryParser parser = new QueryParser(tokens);
ParseTree parseTree = parser.query();
JsonQueryEvaluator evaluator =
new JsonQueryEvaluator(item);
return evaluator.visit(parseTree)
} else {
...
}
}
...
}
That’s it, folks. We now have a custom grammar that can be used to, for example, assert conditions within a JSON object while writing tests. Of course, there’s room for improvement in terms of optimizing the grammar and the parsing logic. Head over to GitHub to grab the source code and experiment with it.
Happy coding!
Published at DZone with permission of Uday Chandra, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments