Parboiled java patterns (Errors inside Optional)

I’ve been using Parboiled Java on a current project and think it is fantastic. Best thing since sliced bread and regexp. But as I was finishing up the edge conditions of the testing, I ran into some fundamental pattern issues which are not obvious and not in the examples. So here are some notes:

My problem domain syntax is essentially OData with some modifications. We’ve kept the basic ideas and added some necessary extensions. A user will create a URL representing a database search, send it to my system, I parse that URL and convert it to a search object, perform the search, then reply with the data from the search.

It was easy to create code for  the matching syntax. It was the error conditions and catching the errors that turned easy into hard and ugly. I don’t mind hard but I do mind ugly, so perhaps this post will lead to a better approach. Otoh, error handling and parsing tend to produce really ugly children, so a clean approach may not be possible either.

Here is an example of a URL snippet:

Person($expand=location($current))

Without getting into the meaning much, this is a database query that means Find all Persons and “expand” their “location” property which are “current”. In other words, find all people and return the people and their current locations.

Below is a parboiled parser class that does the trick:

import static org.parboiled.errors.ErrorUtils.printParseErrors;

import org.parboiled.BaseParser;
import org.parboiled.Parboiled;
import org.parboiled.Rule;
import org.parboiled.annotations.BuildParseTree;
import org.parboiled.parserunners.ParseRunner;
import org.parboiled.parserunners.TracingParseRunner;
import org.parboiled.support.ParseTreeUtils;
import org.parboiled.support.ParsingResult;

@BuildParseTree
public class BlogTest extends BaseParser {
    public static void main(String[] args) {
        parse(“Person($expand=location($current))”);
    }
    private static BlogTest parser = Parboiled.createParser(BlogTest.class);
    public static boolean RESULT_TREE_ON = true;
    public static boolean STDOUT = true;
   
    public static void parse(String str) {
        ParseRunner runner = new TracingParseRunner(parser.Query());
        ParsingResult result = runner.run(str);
        if (result.hasErrors()) {
            String errorMessage = printParseErrors(result);
            if (STDOUT) {
                System.out.println(“\nParse Errors:\n” + errorMessage);
            }
        }
        if (RESULT_TREE_ON) {
            System.out.println(str + ” ===>  ” + ParseTreeUtils.printNodeTree(result));
        }
        if (STDOUT) {
            System.out.println(“Parse String = “+str);
            QueryAst query = (QueryAst) result.resultValue;
            query.print();
        }
       
    }
    Rule Query() {
        // This matches the Person($expand=location($current=true))
        return Sequence(
            Word(),
            matchedQueryName(),
            Optional(
                Sequence(
                    “(“,
                    Expand(),
                    “)”
                )
            )
        );
    }
    protected boolean matchedQueryName() {
        QueryAst query = new QueryAst();
        query.name = match();
        return push(query);
    }
    Rule Word() {
        return OneOrMore(
            LetterOrDigit()
        );
    }
    Rule LetterOrDigit() {
        return FirstOf(CharRange(‘a’, ‘z’), CharRange(‘A’, ‘Z’));
    }
    Rule Expand() {
        // This matches the $expand=location($current=true)
        return Sequence(
            “$expand=”,
            Word(),
            matchedExpand(),
            Optional(
                “(“,
                Optional(
                    Current()
                ),
                “)”
            ),
            popExpandAst()
        );
    }
    protected boolean matchedExpand(){
        ExpandAst expand = new ExpandAst();
        expand.name = match();
        return push(expand);
    }
    protected boolean popExpandAst(){
        ExpandAst expand = (ExpandAst) pop();
        QueryAst query = (QueryAst) peek();
        query.expand = expand;
        return true;
    }
    Rule Current() {
        // This matches the $current
        return Sequence(
            “$current”,
            currentSucceeded()
            );
    }
    protected boolean currentSucceeded() {
        ExpandAst prop = (ExpandAst) peek();
        prop.current = true;
        return true;
    }
   
    static class QueryAst {
        public String name;
        public ExpandAst expand;
        public void print() {
            System.out.println(“Query: name=”+name);
            if (expand == null){
                System.out.println(“expand == null”);
            } else {
                expand.print();
            }
        }
    }
    static class ExpandAst {
        public String name;
        public boolean current;
       
        public void print() {
            System.out.println(“Expand: name=”+name + ” current=”+current);
        }
    }
}

The problem is all the optionals. The parentheses are optional, and the text inside the parentheses are optional. In other words, the following URLs  are all legal:

Person($expand=location($current))
Person($expand=location())
Person($expand=location)
Person()
Person

But if there is text inside the parentheses, then it must be correct. In other words, the following URL is illegal:

Person($expand=location($currents))

When the java class shown above parses the above string, it will succeed because it matched on “$current” and didn’t care that the following letter was a “c”. We need to correct this.

One way to do this is to use the TestNot() rule. (If there are better ways, I’d love to know.) Change the Current() rule to the following:

    Rule Current() {
        // This matches the $current and does not match $currents
             return Sequence(
                 “$current”,
                 TestNot(LetterOrDigit()),
                 currentSucceeded()
        );
    }

So now we will no longer match on $currents. Problem is, the parser still succeeds because the Current() rule is enclosed by an Optional(). The error inside the Current() rule matching will be ignored.

We want the parsing to fail with a useful error message where the error occurred. How do we do this?

I’ve come up with two approaches, neither of which I am that crazy about:

Option 1 is to throw an Exception when the illegal text is discovered. Something like:

    Rule Current() {
        // This matches the $current and does not match $currents
             return Sequence(
                 “$current”,
                 Optional(
                    LetterOrDigit(),
                    throwError(“Illegal character found after $current”+match())
                 ),
                 currentSucceeded()
        );
    }
    protected boolean throwError(String msg){
        throw new ActionException(msg);
    }

Option 2 is drastic: We define our parsing rules to accept “almost anything”, use the match() method to build up intermediate objects (i.e., abstract syntax trees or ASTs), then analyze these objects for correct data.

An argument for option 2 is that the parsing phase should be separate from the validation phase. And one can build up a generic and reusable AST system. I spent a day working this out, but ultimately abandoned it as the parsing rules had become so “unrelated” from the “correct syntax” of my problem domain. I didn’t like the code-smell.

I’m still working through this, but right now Option 1 is forging ahead.

Robyn’s room 2012

Daughter spent the day cleaning and re-arranging her room. The miracle of mess-removal, and once again being able to see the hard wood floors that her Mother and I discovered under crappy carpet and restored. We thought of taking a picture, then I remembered that photosynth is available for her new itouch, so we played with that. Good, but not good enough for Dad, so I spent 2 hrs taking the pictures, and maybe 10 hrs building the panorama with hugin, then the computer did a few hours of cranking as well. This one came out quite nicely. Problems mostly due to the average camera lens.

Rob Gillaspie – Corporate Slave

Back in 1993, I was working an hellacious job at the refinery in El Dorado. (Hell Dorado, as I called it.) Monday through Friday in the plant, home for weekends. Lousy project. One bar in town and these were the days when you needed a “membership” to enter. I wound up eating most of my meals in my motel room.

Anyway, I was back in Lawrence for Art in the Park, and I ran into a very talented kid (high school senior) who was displaying his work. His name was Rob Gillaspie. One image in particular caught my eye: Corporate Slave. Surprise 🙂

I told him I wanted to buy it and how much would he sell it for. Nice kid, he was stunned, and eventually came up with “five dollars”. I talked him up to ten dollars with a lecture about how it was going to cost me twice that to have it framed. Anyway, we parted company and I never saw nor heard from him again. Periodically I google and FB his name, but nothing yet. So now at least his name and his art work are in google and perhaps he can find me.

 

Rob Gillaspie
Corporate Slave
1993

Update: My google trick was successful and he finally showed up. You can find him at  Mal Content google user