I’ve been using Parboiled Java on a current project and think it is fantastic. Best thing since sliced bread and regexp. But as I was finishing up the edge conditions of the testing, I ran into some fundamental pattern issues which are not obvious and not in the examples. So here are some notes:
My problem domain syntax is essentially OData with some modifications. We’ve kept the basic ideas and added some necessary extensions. A user will create a URL representing a database search, send it to my system, I parse that URL and convert it to a search object, perform the search, then reply with the data from the search.
It was easy to create code for the matching syntax. It was the error conditions and catching the errors that turned easy into hard and ugly. I don’t mind hard but I do mind ugly, so perhaps this post will lead to a better approach. Otoh, error handling and parsing tend to produce really ugly children, so a clean approach may not be possible either.
Here is an example of a URL snippet:
Person($expand=location($current))
Without getting into the meaning much, this is a database query that means Find all Persons and “expand” their “location” property which are “current”. In other words, find all people and return the people and their current locations.
Below is a parboiled parser class that does the trick:
import org.parboiled.BaseParser;
import org.parboiled.Parboiled;
import org.parboiled.Rule;
import org.parboiled.annotations.BuildParseTree;
import org.parboiled.parserunners.ParseRunner;
import org.parboiled.parserunners.TracingParseRunner;
import org.parboiled.support.ParseTreeUtils;
import org.parboiled.support.ParsingResult;
@BuildParseTree
public class BlogTest extends BaseParser
The problem is all the optionals. The parentheses are optional, and the text inside the parentheses are optional. In other words, the following URLs are all legal:
Person($expand=location($current))
Person($expand=location())
Person($expand=location)
Person()
Person
But if there is text inside the parentheses, then it must be correct. In other words, the following URL is illegal:
Person($expand=location($currents))
When the java class shown above parses the above string, it will succeed because it matched on “$current” and didn’t care that the following letter was a “c”. We need to correct this.
One way to do this is to use the TestNot() rule. (If there are better ways, I’d love to know.) Change the Current() rule to the following:
// This matches the $current and does not match $currents
TestNot(LetterOrDigit()),
currentSucceeded()
);
}
So now we will no longer match on $currents. Problem is, the parser still succeeds because the Current() rule is enclosed by an Optional(). The error inside the Current() rule matching will be ignored.
We want the parsing to fail with a useful error message where the error occurred. How do we do this?
I’ve come up with two approaches, neither of which I am that crazy about:
Option 1 is to throw an Exception when the illegal text is discovered. Something like:
// This matches the $current and does not match $currents
currentSucceeded()
);
}
throw new ActionException(msg);
}
Option 2 is drastic: We define our parsing rules to accept “almost anything”, use the match() method to build up intermediate objects (i.e., abstract syntax trees or ASTs), then analyze these objects for correct data.
An argument for option 2 is that the parsing phase should be separate from the validation phase. And one can build up a generic and reusable AST system. I spent a day working this out, but ultimately abandoned it as the parsing rules had become so “unrelated” from the “correct syntax” of my problem domain. I didn’t like the code-smell.
I’m still working through this, but right now Option 1 is forging ahead.