1[/============================================================================== 2 Copyright (C) 2001-2011 Joel de Guzman 3 Copyright (C) 2001-2011 Hartmut Kaiser 4 Copyright (C) 2009 Andreas Haberstroh? 5 6 Distributed under the Boost Software License, Version 1.0. (See accompanying 7 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 8===============================================================================/] 9 10[section:indepth In Depth] 11 12[section:parsers_indepth Parsers in Depth] 13 14This section is not for the faint of heart. In here, are distilled the inner 15workings of __qi__ parsers, using real code from the __spirit__ library as 16examples. On the other hand, here is no reason to fear reading on, though. 17We tried to explain things step by step while highlighting the important 18insights. 19 20The `__parser_concept__` class is the base class for all parsers. 21 22[import ../../../../boost/spirit/home/qi/parser.hpp] 23[parser_base_parser] 24 25The `__parser_concept__` class does not really know how to parse anything but 26instead relies on the template parameter `Derived` to do the actual parsing. 27This technique is known as the "Curiously Recurring Template Pattern" in template 28meta-programming circles. This inheritance strategy gives us the power of 29polymorphism without the virtual function overhead. In essence this is a way to 30implement compile time polymorphism. 31 32The Derived parsers, `__primitive_parser_concept__`, `__unary_parser_concept__`, 33`__binary_parser_concept__` and `__nary_parser_concept__` provide the necessary 34facilities for parser detection, introspection, transformation and visitation. 35 36Derived parsers must support the following: 37 38[variablelist bool parse(f, l, context, skip, attr) 39 [[`f`, `l`] [first/last iterator pair]] 40 [[`context`] [enclosing rule context (can be unused_type)]] 41 [[`skip`] [skipper (can be unused_type)]] 42 [[`attr`] [attribute (can be unused_type)]] 43] 44 45The /parse/ is the main parser entry point. /skipper/ can be an `unused_type`. 46It's a type used every where in __spirit__ to signify "don't-care". There 47is an overload for /skip/ for `unused_type` that is simply a no-op. 48That way, we do not have to write multiple parse functions for 49phrase and character level parsing. 50 51Here are the basic rules for parsing: 52 53* The parser returns `true` if successful, `false` otherwise. 54* If successful, `first` is incremented N number of times, where N 55 is the number of characters parsed. N can be zero --an empty (epsilon) 56 match. 57* If successful, the parsed attribute is assigned to /attr/ 58* If unsuccessful, `first` is reset to its position before entering 59 the parser function. /attr/ is untouched. 60 61[variablelist void what(context) 62 [[`context`] [enclosing rule context (can be `unused_type`)]] 63] 64 65The /what/ function should be obvious. It provides some information 66about ["what] the parser is. It is used as a debugging aid, for 67example. 68 69[variablelist P::template attribute<context>::type 70 [[`P`] [a parser type]] 71 [[`context`] [A context type (can be unused_type)]] 72] 73 74The /attribute/ metafunction returns the expected attribute type 75of the parser. In some cases, this is context dependent. 76 77In this section, we will dissect two parser types: 78 79[variablelist Parsers 80 [[`__primitive_parser_concept__`] [A parser for primitive data (e.g. integer parsing).]] 81 [[`__unary_parser_concept__`] [A parser that has single subject (e.g. kleene star).]] 82] 83 84[/------------------------------------------------------------------------------] 85[heading Primitive Parsers] 86 87For our dissection study, we will use a __spirit__ primitive, the `any_int_parser` 88in the boost::spirit::qi namespace. 89 90[import ../../../../boost/spirit/home/qi/numeric/int.hpp] 91[primitive_parsers_any_int_parser] 92 93The `any_int_parser` is derived from a `__primitive_parser_concept__<Derived>`, 94which in turn derives from `parser<Derived>`. Therefore, it supports the 95following requirements: 96 97* The `parse` member function 98* The `what` member function 99* The nested `attribute` metafunction 100 101/parse/ is the main entry point. For primitive parsers, our first thing to do is 102call: 103 104`` 105qi::skip(first, last, skipper); 106`` 107 108to do a pre-skip. After pre-skipping, the parser proceeds to do its thing. The 109actual parsing code is placed in `extract_int<T, Radix, MinDigits, 110MaxDigits>::call(first, last, attr);` 111 112This simple no-frills protocol is one of the reasons why __spirit__ is 113fast. If you know the internals of __classic__ and perhaps 114even wrote some parsers with it, this simple __spirit__ mechanism 115is a joy to work with. There are no scanners and all that crap. 116 117The /what/ function just tells us that it is an integer parser. Simple. 118 119The /attribute/ metafunction returns the T template parameter. We associate the 120`any_int_parser` to some placeholders for `short_`, `int_`, `long_` and 121`long_long` types. But, first, we enable these placeholders in namespace 122boost::spirit: 123 124[primitive_parsers_enable_short] 125[primitive_parsers_enable_int] 126[primitive_parsers_enable_long] 127[primitive_parsers_enable_long_long] 128 129Notice that `any_int_parser` is placed in the namespace boost::spirit::qi 130while these /enablers/ are in namespace boost::spirit. The reason is 131that these placeholders are shared by other __spirit__ /domains/. __qi__, 132the parser is one domain. __karma__, the generator is another domain. 133Other parser technologies may be developed and placed in yet 134another domain. Yet, all these can potentially share the same 135placeholders for interoperability. The interpretation of these 136placeholders is domain-specific. 137 138Now that we enabled the placeholders, we have to write generators 139for them. The make_xxx stuff (in boost::spirit::qi namespace): 140 141[primitive_parsers_make_int] 142 143This one above is our main generator. It's a simple function object 144with 2 (unused) arguments. These arguments are 145 146# The actual terminal value obtained by proto. In this case, either 147 a short_, int_, long_ or long_long. We don't care about this. 148 149# Modifiers. We also don't care about this. This allows directives 150 such as `no_case[p]` to pass information to inner parser nodes. 151 We'll see how that works later. 152 153Now: 154 155[primitive_parsers_short_primitive] 156[primitive_parsers_int_primitive] 157[primitive_parsers_long_primitive] 158[primitive_parsers_long_long_primitive] 159 160These, specialize `qi:make_primitive` for specific tags. They all 161inherit from `make_int` which does the actual work. 162 163[heading Composite Parsers] 164 165Let me present the kleene star (also in namespace spirit::qi): 166 167[import ../../../../boost/spirit/home/qi/operator/kleene.hpp] 168[composite_parsers_kleene] 169 170Looks similar in form to its primitive cousin, the `int_parser`. And, again, it 171has the same basic ingredients required by `Derived`. 172 173* The nested attribute metafunction 174* The parse member function 175* The what member function 176 177kleene is a composite parser. It is a parser that composes another 178parser, its ["subject]. It is a `__unary_parser_concept__` and subclasses from it. 179Like `__primitive_parser_concept__`, `__unary_parser_concept__<Derived>` derives 180from `parser<Derived>`. 181 182unary_parser<Derived>, has these expression requirements on Derived: 183 184* p.subject -> subject parser ( ['p] is a __unary_parser_concept__ parser.) 185* P::subject_type -> subject parser type ( ['P] is a __unary_parser_concept__ type.) 186 187/parse/ is the main parser entry point. Since this is not a primitive 188parser, we do not need to call `qi::skip(first, last, skipper)`. The 189['subject], if it is a primitive, will do the pre-skip. If if it is 190another composite parser, it will eventually call a primitive parser 191somewhere down the line which will do the pre-skip. This makes it a 192lot more efficient than __classic__. __classic__ puts the skipping business 193into the so-called "scanner" which blindly attempts a pre-skip 194every time we increment the iterator. 195 196What is the /attribute/ of the kleene? In general, it is a `std::vector<T>` 197where `T` is the attribute of the subject. There is a special case though. 198If `T` is an `unused_type`, then the attribute of kleene is also `unused_type`. 199`traits::build_std_vector` takes care of that minor detail. 200 201So, let's parse. First, we need to provide a local attribute of for 202the subject: 203 204`` 205typename traits::attribute_of<Subject, Context>::type val; 206`` 207 208`traits::attribute_of<Subject, Context>` simply calls the subject's 209`struct attribute<Context>` nested metafunction. 210 211/val/ starts out default initialized. This val is the one we'll 212pass to the subject's parse function. 213 214The kleene repeats indefinitely while the subject parser is 215successful. On each successful parse, we `push_back` the parsed 216attribute to the kleene's attribute, which is expected to be, 217at the very least, compatible with a `std::vector`. In other words, 218although we say that we want our attribute to be a `std::vector`, 219we try to be more lenient than that. The caller of kleene's 220parse may pass a different attribute type. For as long as it is 221also a conforming STL container with `push_back`, we are ok. Here 222is the kleene loop: 223 224`` 225while (subject.parse(first, last, context, skipper, val)) 226{ 227 // push the parsed value into our attribute 228 traits::push_back(attr, val); 229 traits::clear(val); 230} 231return true; 232`` 233Take note that we didn't call attr.push_back(val). Instead, we 234called a Spirit provided function: 235 236`` 237traits::push_back(attr, val); 238`` 239 240This is a recurring pattern. The reason why we do it this way is 241because attr [*can] be `unused_type`. `traits::push_back` takes care 242of that detail. The overload for unused_type is a no-op. Now, you 243can imagine why __spirit__ is fast! The parsers are so simple and the 244generated code is as efficient as a hand rolled loop. All these 245parser compositions and recursive parse invocations are extensively 246inlined by a modern C++ compiler. In the end, you get a tight loop 247when you use the kleene. No more excess baggage. If the attribute 248is unused, then there is no code generated for that. That's how 249__spirit__ is designed. 250 251The /what/ function simply wraps the output of the subject in a 252"kleene[" ... "]". 253 254Ok, now, like the `int_parser`, we have to hook our parser to the 255_qi_ engine. Here's how we do it: 256 257First, we enable the prefix star operator. In proto, it's called 258the "dereference": 259 260[composite_parsers_kleene_enable_] 261 262This is done in namespace `boost::spirit` like its friend, the `use_terminal` 263specialization for our `int_parser`. Obviously, we use /use_operator/ to 264enable the dereference for the qi::domain. 265 266Then, we need to write our generator (in namespace qi): 267 268[composite_parsers_kleene_generator] 269 270This essentially says; for all expressions of the form: `*p`, to build a kleene 271parser. Elements is a __fusion__ sequence. For the kleene, which is a unary 272operator, expect only one element in the sequence. That element is the subject 273of the kleene. 274 275We still don't care about the Modifiers. We'll see how the modifiers is 276all about when we get to deep directives. 277 278[endsect] 279 280[endsect] 281