Parboiled java patterns (Errors inside Optional)

I’ve been using Parboiled Java on a current project and think it is fantastic. Best thing since sliced bread and regexp. But as I was finishing up the edge conditions of the testing, I ran into some fundamental pattern issues which are not obvious and not in the examples. So here are some notes:

My problem domain syntax is essentially OData with some modifications. We’ve kept the basic ideas and added some necessary extensions. A user will create a URL representing a database search, send it to my system, I parse that URL and convert it to a search object, perform the search, then reply with the data from the search.

It was easy to create code for  the matching syntax. It was the error conditions and catching the errors that turned easy into hard and ugly. I don’t mind hard but I do mind ugly, so perhaps this post will lead to a better approach. Otoh, error handling and parsing tend to produce really ugly children, so a clean approach may not be possible either.

Here is an example of a URL snippet:

Person($expand=location($current))

Without getting into the meaning much, this is a database query that means Find all Persons and “expand” their “location” property which are “current”. In other words, find all people and return the people and their current locations.

Below is a parboiled parser class that does the trick:

import static org.parboiled.errors.ErrorUtils.printParseErrors;

import org.parboiled.BaseParser;
import org.parboiled.Parboiled;
import org.parboiled.Rule;
import org.parboiled.annotations.BuildParseTree;
import org.parboiled.parserunners.ParseRunner;
import org.parboiled.parserunners.TracingParseRunner;
import org.parboiled.support.ParseTreeUtils;
import org.parboiled.support.ParsingResult;

@BuildParseTree
public class BlogTest extends BaseParser {
    public static void main(String[] args) {
        parse(“Person($expand=location($current))”);
    }
    private static BlogTest parser = Parboiled.createParser(BlogTest.class);
    public static boolean RESULT_TREE_ON = true;
    public static boolean STDOUT = true;
   
    public static void parse(String str) {
        ParseRunner runner = new TracingParseRunner(parser.Query());
        ParsingResult result = runner.run(str);
        if (result.hasErrors()) {
            String errorMessage = printParseErrors(result);
            if (STDOUT) {
                System.out.println(“\nParse Errors:\n” + errorMessage);
            }
        }
        if (RESULT_TREE_ON) {
            System.out.println(str + ” ===>  ” + ParseTreeUtils.printNodeTree(result));
        }
        if (STDOUT) {
            System.out.println(“Parse String = “+str);
            QueryAst query = (QueryAst) result.resultValue;
            query.print();
        }
       
    }
    Rule Query() {
        // This matches the Person($expand=location($current=true))
        return Sequence(
            Word(),
            matchedQueryName(),
            Optional(
                Sequence(
                    “(“,
                    Expand(),
                    “)”
                )
            )
        );
    }
    protected boolean matchedQueryName() {
        QueryAst query = new QueryAst();
        query.name = match();
        return push(query);
    }
    Rule Word() {
        return OneOrMore(
            LetterOrDigit()
        );
    }
    Rule LetterOrDigit() {
        return FirstOf(CharRange(‘a’, ‘z’), CharRange(‘A’, ‘Z’));
    }
    Rule Expand() {
        // This matches the $expand=location($current=true)
        return Sequence(
            “$expand=”,
            Word(),
            matchedExpand(),
            Optional(
                “(“,
                Optional(
                    Current()
                ),
                “)”
            ),
            popExpandAst()
        );
    }
    protected boolean matchedExpand(){
        ExpandAst expand = new ExpandAst();
        expand.name = match();
        return push(expand);
    }
    protected boolean popExpandAst(){
        ExpandAst expand = (ExpandAst) pop();
        QueryAst query = (QueryAst) peek();
        query.expand = expand;
        return true;
    }
    Rule Current() {
        // This matches the $current
        return Sequence(
            “$current”,
            currentSucceeded()
            );
    }
    protected boolean currentSucceeded() {
        ExpandAst prop = (ExpandAst) peek();
        prop.current = true;
        return true;
    }
   
    static class QueryAst {
        public String name;
        public ExpandAst expand;
        public void print() {
            System.out.println(“Query: name=”+name);
            if (expand == null){
                System.out.println(“expand == null”);
            } else {
                expand.print();
            }
        }
    }
    static class ExpandAst {
        public String name;
        public boolean current;
       
        public void print() {
            System.out.println(“Expand: name=”+name + ” current=”+current);
        }
    }
}

The problem is all the optionals. The parentheses are optional, and the text inside the parentheses are optional. In other words, the following URLs  are all legal:

Person($expand=location($current))
Person($expand=location())
Person($expand=location)
Person()
Person

But if there is text inside the parentheses, then it must be correct. In other words, the following URL is illegal:

Person($expand=location($currents))

When the java class shown above parses the above string, it will succeed because it matched on “$current” and didn’t care that the following letter was a “c”. We need to correct this.

One way to do this is to use the TestNot() rule. (If there are better ways, I’d love to know.) Change the Current() rule to the following:

    Rule Current() {
        // This matches the $current and does not match $currents
             return Sequence(
                 “$current”,
                 TestNot(LetterOrDigit()),
                 currentSucceeded()
        );
    }

So now we will no longer match on $currents. Problem is, the parser still succeeds because the Current() rule is enclosed by an Optional(). The error inside the Current() rule matching will be ignored.

We want the parsing to fail with a useful error message where the error occurred. How do we do this?

I’ve come up with two approaches, neither of which I am that crazy about:

Option 1 is to throw an Exception when the illegal text is discovered. Something like:

    Rule Current() {
        // This matches the $current and does not match $currents
             return Sequence(
                 “$current”,
                 Optional(
                    LetterOrDigit(),
                    throwError(“Illegal character found after $current”+match())
                 ),
                 currentSucceeded()
        );
    }
    protected boolean throwError(String msg){
        throw new ActionException(msg);
    }

Option 2 is drastic: We define our parsing rules to accept “almost anything”, use the match() method to build up intermediate objects (i.e., abstract syntax trees or ASTs), then analyze these objects for correct data.

An argument for option 2 is that the parsing phase should be separate from the validation phase. And one can build up a generic and reusable AST system. I spent a day working this out, but ultimately abandoned it as the parsing rules had become so “unrelated” from the “correct syntax” of my problem domain. I didn’t like the code-smell.

I’m still working through this, but right now Option 1 is forging ahead.