Friday, June 19, 2015

Using a C++ Flex Lexer with a C++ Bison Parser

I recently found myself revisiting lexing and parsing as part of my research. It's one of those cases where I would get away with ad-hoc parsing with line-splitting and regular expression matching, but the canonical alternative might ultimately turn out to be worth some extra effort for a number of reasons.

Many years ago, I wrote a SQL DDL parser as part of some object-relational mapping research into a “mutual containment” object model for transparently representing junction tables in relational databases. Even if I do say so myself, it was a neat idea. I don't know if anyone else has since thought of it independently and implemented it.

So, I decided to revisit flex and bison, the most popular versions of the venerable and classic compiler construction tools, lex and yacc. This time around, though, I anticipated a possible need for two parser/lexer subsystems, so I was interested in the C++ capabilities of both tools, since the “vanilla” C code they emit uses global variables.

The GNU Bison Manual has A Complete C++ Example that, unfortunately, rather narrowly interprets what it means to be “complete C++ example” to mean “an example where the bison bits are in C++”, and uses the vanilla flex lexer with global variables.

I found a few examples of using flex and bison with C++, but even the best of them only address one or the other, go off on irrelevant tangents, are short on explanation, contain outright misleading comments, use deprecated constructs, or all of the above.

All I wanted, was the minimal example of how to use a C++ flex lexer with a C++ bison parser. Some kind of “addendum” to the “complete” C++ example from the bison manual would be perfect!

*Crickets*.

So now, ladies and gentlemen, for your enjoyment, I have added to my Bitbucket “miscellany”, such an elucidation of Modifying the Bison “Complete C++ Example” to Use a C++ Flex Lexer as I formerly desired.

It's not as easy as it sounds.

No comments:

Post a Comment