Archive for December, 2009

Using ANTLR to parse boolean queries

December 23rd, 2009

ANTLR is a well-known parser generator.  You supply it with a grammar and it builds a lexer and parser in the programming language of your choice.   I’ve just spent a few hours getting to grips with ANTLR basics, so I thought I’d document it for future reference.

My basic requirement is to transform an end-user boolean query to an XML document.   A simple example of a query I need to parse is:

john AND (joe OR sue)

I want the generated parser to produce the following syntax tree for this query:


It should then be trivial to walk the tree and write the XML document I need.

An ANTLR grammar is specified in a text file with a .g extension.   Here’s how my SimpleBoolean.g starts:

grammar SimpleBoolean;

  language = CSharp2;
  output = AST;

I’m specifying that the generated code should be C#, and that the parser should build an abstract syntax tree (AST).

Next I specify the lexer rules:

LPAREN : '(' ;
RPAREN : ')' ;
AND : 'AND';
OR : 'OR';
WS :  ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}  ;
WORD :  (~( ' ' | '\t' | '\r' | '\n' | '(' | ')' ))*;

These rules are used to turn the source text into a stream of tokens.  So parentheses and operators are special tokens, and anything else is either whitespace (which is skipped) or a word.

Finally I specify the parser rules:

expr : andexpr;
andexpr : orexpr (AND^ orexpr)*;
orexpr : atom (OR^ atom)*;
atom : WORD | LPAREN! expr RPAREN!;

This is the confusing bit.  The chain of expressions (expr –> andexpr –> orexpr) is used to signify precedence.  In this case I’ve made OR higher precedence than AND.   Also notice that andexpr (for example) does not require an AND – it is optional.  This is why the grammar supports an expr containing only an OR, even though expr is defined in terms of andexpr.

The symbol suffixes (^ on the operators and ! on the parentheses) are there to direct the AST generation.  ^ signifies a tree branch, whereas ! indicates something that should be omitted from the tree.   Anything else is a leaf.   (The parentheses are omitted because they are redundant – the structure of the tree gives the order of evaluation implicitly.)

So SimpleBoolean.g is now complete, and all that remains is to ask ANTLR to generate a C# parser from it.  I used the ANTLRWorks IDE for this, but you could use the command-line.   Once the C# files are generated, and the ANTLR .NET runtimes files added to a C# project, we’re ready to write some C#.   Here’s some code that parses a query and then walks the syntax tree and writes the nodes to the console:

class Program
    static void Main(string[] args)
        ANTLRStringStream expression = new ANTLRStringStream("john AND (joe OR sue)");
        var tokens = new CommonTokenStream(new SimpleBooleanLexer(expression));
        var parser = new SimpleBooleanParser(tokens);

        SimpleBooleanParser.expr_return ret = parser.expr();
        CommonTree ast = (CommonTree)ret.Tree;

    static void Print(CommonTree tree, int level)
        Console.WriteLine(new string('\t', level) + tree.Text);
        if (tree.Children != null)
            foreach (CommonTree child in tree.Children)

which prints the following:


For a more comprehensive ANTLR tutorial, see this series of blog posts.

Unity’s VirtualMethodInterceptor with internal classes

December 13th, 2009

Unity includes some basic AOP functionality.  One of the interception mechanisms it offers is VirtualMethodInterceptor, which works by building a derived class at runtime and overriding the virtual methods of the target class.  This approach has some obvious limitations, but it seemed like a good way to handle some simple logging & profiling requirements that I have with my current project.

The first problem I hit is that the current release of Unity (1.2) has a huge bug – VirtualMethodInterceptor doesn’t work with classes that have parameters in their constructors.  Fortunately downloading and building the latest source was enough to get around that problem.

The next problem is that VirtualMethodInterceptor requires the target class to be public.  This is a common problem with code that builds derived classes at runtime.  Moq has the same issue, for example.  The standard workaround is to use the InternalsVisibleTo assembly attribute to give the dynamic assembly access to internal classes in the target assembly.   So after a little digging in the source I found the name of the dynamic assembly, and added this to my assembly:

[assembly: InternalsVisibleTo("Unity_ILEmit_DynamicClasses")]

Unfortunately this is not sufficient.  The Unity code has some validation that insists on the target class being public.  The code is in VirtualMethodInterceptor.cs:

public bool CanIntercept(Type t)
    Guard.ArgumentNotNull(t, "t");
    return t.IsClass &&
        (t.IsPublic || t.IsNestedPublic) &&
        t.IsVisible &&

Removing the checks for IsPublic and IsVisible was enough to finally get everything to work correctly:

public bool CanIntercept(Type t)
    Guard.ArgumentNotNull(t, "t");
    return t.IsClass && !t.IsSealed;

Hopefully this will be fixed in time for the release of Unity 2.0.

Update: For internal interfaces with InterfaceInterceptor, you need this:

[assembly: InternalsVisibleTo("Unity_ILEmit_InterfaceProxies")]