|
@@ -5,38 +5,39 @@
|
|
|
// file, You can obtain one at http://mozilla.org/MPL/2.0/.
|
|
|
|
|
|
/**
|
|
|
- @page parser Flex/Bison parsers
|
|
|
+@page parser Flex/Bison Parsers
|
|
|
|
|
|
@section parserIntro Parser background
|
|
|
|
|
|
-Kea's data format of choice is JSON (https://tools.ietf.org/html/rfc7159), which
|
|
|
+Kea's data format of choice is JSON (defined in https://tools.ietf.org/html/rfc7159), which
|
|
|
is used in configuration files, in the command channel and also when
|
|
|
-communicating between DHCP servers and DHCP-DDNS component. It is almost certain
|
|
|
-it will be used as the data format for any new features.
|
|
|
-
|
|
|
-Historically, Kea used @ref isc::data::Element::fromJSON and @ref
|
|
|
-isc::data::Element::fromJSONFile methods to parse received data that is expected
|
|
|
-to be in JSON syntax. This in-house parser was developed back in the early BIND10
|
|
|
-days. Its two main advantages were that it didn't have any external dependencies
|
|
|
-and that it was already available in the source tree when the Kea project
|
|
|
+communicating between the DHCP servers and the DHCP-DDNS component. It is almost certain
|
|
|
+to be used as the data format for any new features.
|
|
|
+
|
|
|
+Historically, Kea used the @ref isc::data::Element::fromJSON and @ref
|
|
|
+isc::data::Element::fromJSONFile methods to parse data expected
|
|
|
+to be in JSON syntax. This in-house parser was developed back in the early days of
|
|
|
+Kea when it was part of BIND 10.
|
|
|
+Its main advantages were that it didn't have any external dependencies
|
|
|
+and that it was already available in the source tree when Kea development
|
|
|
started. On the other hand, it was very difficult to modify (several attempts to
|
|
|
-implement more robust comments had failed) and not well implemented. Also, it
|
|
|
-was pure JSON parser, so it accepted anything as long as the content was correct
|
|
|
-JSON. This has led to other problems - the syntactic checks were conducted much
|
|
|
-later, when some of the information (e.g. line numbers) was no longer
|
|
|
-available. To print meaningful error messages for example, we had to develop a
|
|
|
-way to store filename, line and column information. This on the other hand, led
|
|
|
-to duplication. Anyway, this part of the processing is something we can refer to
|
|
|
-as phase 1: get input string, parse it and generate a tree of @ref
|
|
|
-isc::data::Element objects using shared pointers.
|
|
|
-
|
|
|
-That Element tree was then processed by set of dedicated parsers. Each parser
|
|
|
+implement more robust comments had failed) and lacked a number of features. Also, it
|
|
|
+was a pure JSON parser, so accepted anything as long as the content was correct
|
|
|
+JSON. (This caused some problems: for example, the syntactic checks were conducted late in the
|
|
|
+parsing process, by which time some of the information, e.g. line numbers, was no longer
|
|
|
+available. To print meaningful error messages, the Kea team had to develop a
|
|
|
+way to store filename, line and column information. Unfortunately this gave rise to other problems
|
|
|
+such as data duplication.) The output from these parsers was a tree of @ref
|
|
|
+isc::data::Element objects using shared pointers. This part of the processing we
|
|
|
+can refer to as phase 1.
|
|
|
+
|
|
|
+The Element tree was then processed by set of dedicated parsers. Each parser
|
|
|
was able to handle its own context, e.g. global, subnet list, subnet, pool
|
|
|
-etc. This step took the tree generated in the earlier step, parsed it and
|
|
|
-generated output configuration (e.g. @ref isc::dhcp::SrvConfig) or dynamic
|
|
|
-structures (e.g. isc::data::Host). There were a large number of parser objects
|
|
|
-derived from @ref isc::dhcp::DhcpConfigParser) instantiated for each scope and
|
|
|
-instance of data (e.g. to parse 1000 host reservation entries a thousand of
|
|
|
+etc. This step took the tree generated in phase 1, parsed it and
|
|
|
+generateda an output configuration (e.g. @ref isc::dhcp::SrvConfig) or dynamic
|
|
|
+structures (e.g. isc::data::Host). During this stage, a large number of parser objects
|
|
|
+derived from @ref isc::dhcp::DhcpConfigParser could be instantiated for each scope and
|
|
|
+instance of data (e.g. to parse 1000 host reservation entries a thousand
|
|
|
dedicated parsers were created). For convenience, this step is called phase 2.
|
|
|
|
|
|
Other issues with the old parsers are discussed here: @ref dhcpv6ConfigParserBison
|
|
@@ -44,20 +45,20 @@ Other issues with the old parsers are discussed here: @ref dhcpv6ConfigParserBis
|
|
|
and here: http://kea.isc.org/wiki/SimpleParser.
|
|
|
|
|
|
|
|
|
-@section parserBisonIntro Flex/Bison based parser
|
|
|
+@section parserBisonIntro Flex/Bison Based Parser
|
|
|
|
|
|
To solve the issue of phase 1 mentioned earlier, a new parser has been developed
|
|
|
-that is based on flex and bison tools. The following text uses DHCPv6 as an
|
|
|
-example, but the same principle applies to DHCPv4 and D2 and CA will likely to
|
|
|
-follow. The new parser consists of two core elements with a wrapper around them
|
|
|
-(the following description is slightly oversimplified to convey the intent, more
|
|
|
-detailed description is available in the following sections):
|
|
|
-
|
|
|
--# Flex lexer (src/bin/dhcp6/dhcp6_lexer.ll) that is essentially a set of
|
|
|
- regular expressions with C++ code that creates new tokens that represent whatever
|
|
|
- was just parsed. This lexer will be called iteratively by bison until the whole
|
|
|
+that is based on the "flex and "bison" tools. The following text uses DHCPv6 as an
|
|
|
+example, but the same principle applies to DHCPv4 and D2; CA will likely to
|
|
|
+follow. The new parser consists of two core elements with a wrapper around them.
|
|
|
+The following descriptions are slightly oversimplified in order to convey the intent;
|
|
|
+a more detailed description is available in subsequent sections.
|
|
|
+
|
|
|
+-# Flex lexical analyzer (src/bin/dhcp6/dhcp6_lexer.ll): this is essentially a set of
|
|
|
+ regular expressions and C++ code that creates new tokens that represent whatever
|
|
|
+ was just parsed. This lexical analyzer (lexer) will be called iteratively by bison until the whole
|
|
|
input text is parsed or an error is encountered. For example, a snippet of the
|
|
|
- code could look like this:
|
|
|
+ code might look like this:
|
|
|
@code
|
|
|
\"socket-type\" {
|
|
|
return isc::dhcp::Dhcp6Parser::make_SOCKET_TYPE(driver.loc_);
|
|
@@ -67,7 +68,7 @@ detailed description is available in the following sections):
|
|
|
create a token SOCKET_TYPE and pass to it its current location (that's the
|
|
|
file name, line and column numbers).
|
|
|
|
|
|
--# Bison grammar (src/bin/dhcp6/dhcp6_parser.yy) that defines the syntax.
|
|
|
+-# Bison grammar (src/bin/dhcp6/dhcp6_parser.yy): the module that defines the syntax.
|
|
|
Grammar and syntax are perhaps fancy words, but they simply define what is
|
|
|
allowed and where. Bison grammar starts with a list of tokens. Those tokens
|
|
|
are defined only by name ("here's the list of possible tokens that could
|
|
@@ -82,28 +83,33 @@ detailed description is available in the following sections):
|
|
|
}
|
|
|
}
|
|
|
@endcode
|
|
|
- this code would return the following sentence of tokens: LCURLY_BRACKET,
|
|
|
+ The lexer would generate the following sequence of tokens: LCURLY_BRACKET,
|
|
|
DHCP6, COLON, LCURLY_BRACKET, RENEW_TIMER, COLON, INTEGER
|
|
|
- (a token with a value of 100), RCURLY_BRACKET, RCURLY_BRACKET, END
|
|
|
+ (a token with a value of 100), RCURLY_BRACKET, RCURLY_BRACKET, END. The
|
|
|
+ bison grammar recognises that the sequence forms a valid sentence and that
|
|
|
+ there are no errors and act upon it. (Whereas if the left and
|
|
|
+ right braces in the above example were exchanged, the bison
|
|
|
+ module would identify the sequence as syntactically incorrect.)
|
|
|
|
|
|
-# Parser context. As there is some information that needs to be passed between
|
|
|
parser and lexer, @ref isc::dhcp::Parser6Context is a convenience wrapper
|
|
|
around those two bundled together. It also works as a nice encapsulation,
|
|
|
hiding all the flex/bison details underneath.
|
|
|
|
|
|
-@section parserBuild Building flex/bison code
|
|
|
+@section parserBuild Building Flex/Bison Code
|
|
|
|
|
|
-The only input file used by flex is the .ll file. The only input file used by
|
|
|
+The only input file used by flex is the .ll file and the only input file used by
|
|
|
bison is the .yy file. When making changes to the lexer or parser, only those
|
|
|
-two files are edited. When processed, those two tools will generate a number of
|
|
|
-.h, .hh and .cc files. The major ones are named the same as their .ll and .yy
|
|
|
-counterparts (e.g. dhcp6_lexer.cc, dhcp6_parser.cc and dhcp6_parser.h), but
|
|
|
-there's a number of additional files created: location.hh, position.hh and
|
|
|
+two files are edited. When processed, the two tools generate a number of
|
|
|
+.h, .hh and .cc files. The major ones have the same name as their .ll and .yy
|
|
|
+counterparts (e.g. dhcp6_lexer.cc, dhcp6_parser.cc and dhcp6_parser.h etc.), but
|
|
|
+a number of additional files are also created: location.hh, position.hh and
|
|
|
stack.hh. Those are internal bison headers that are needed for compilation.
|
|
|
|
|
|
-To avoid every user to have flex and bison installed, we chose to generate the
|
|
|
-files and add them to the Kea repository. To generate those files, do the
|
|
|
-following:
|
|
|
+To avoid the need for every user to have flex and bison installed, the output files
|
|
|
+are generated when the .ll or .yy files are altered and are stored in the
|
|
|
+Kea repository. To generate those files, issue the following sequence of
|
|
|
+commands from the top-level Kea directory:
|
|
|
|
|
|
@code
|
|
|
./configure --enable-generate-parser
|
|
@@ -111,46 +117,37 @@ cd src/bin/dhcp6
|
|
|
make parser
|
|
|
@endcode
|
|
|
|
|
|
-Strictly speaking, make parser is not necessary. If you updated .ll or .yy file,
|
|
|
-regular make command should pick those changes up. However, since one source
|
|
|
-file generates multiple output files and you are likely using multi-process
|
|
|
-build (make -j), there may be odd side effects, so I found it more convenient
|
|
|
-to explicitly rebuild the files manually by using "make parser".
|
|
|
-
|
|
|
-One problem flex/bison brings is the tool version dependency. If one developer
|
|
|
-uses version A of those tools and another developer uses B, then the files
|
|
|
-generated may be different and cause unnecessarily large diffs, may cause
|
|
|
-coverity/cpp-check issues appear and disappear and cause general unhappiness.
|
|
|
-To avoid those problems, we will introduce a requirement to generate flex/bison
|
|
|
-files on one dedicated machine. This machine will likely be docs. Currently Ops
|
|
|
-is working on installing the necessary versions of flex/bison required, but
|
|
|
-for the time being we can use the versions installed in Francis' home directory
|
|
|
-(export PATH=/home/fdupont/bin:$PATH).
|
|
|
-
|
|
|
-Note: the above applies only to the code being merged on master. It is probably
|
|
|
-ok to generate the files on your development branch with whatever version you
|
|
|
-have as long as it is not too old. In particular, the bison version needs to be
|
|
|
-at least 3.0.0 and Mac OS has 2.x version installed by default. When reviewing
|
|
|
-tickets that have flex/bison changes, please review .ll and .yy files and ignore
|
|
|
-the files generated from them. If you really insist, you're welcome to review
|
|
|
-them, but in most cases that will be an exercise in futility.
|
|
|
-
|
|
|
-@section parserFlex Flex detailed
|
|
|
-
|
|
|
-Earlier sections described the lexer in a bit over-simplified way. The .ll file
|
|
|
-contains a number of additional elements in addition to the regular expressions
|
|
|
-and they're not as simple as described.
|
|
|
-
|
|
|
-First, there's a number of sections separated by percent (%) signs. Depending
|
|
|
-on which section the code is written in, it may be interpreted by flex, copied
|
|
|
-verbatim to output .cc file, copied to output .h file or copied to both.
|
|
|
+Strictly speaking, the comment "make parser" is not necessary. If you updated
|
|
|
+the .ll or .yy file, the
|
|
|
+regular "make" command should pick those changes up. However, since one source
|
|
|
+file generates multiple output files and you are likely to be using a multi-process
|
|
|
+build (by specifying the "-j" switch on the "make" command), there may be odd side effects:
|
|
|
+explicitly rebuilding the files manually by using "make parser" avoids any trouble.
|
|
|
+
|
|
|
+One problem brought on by use of flex/bison is tool version dependency. If one developer
|
|
|
+uses version A of those tools and another developer uses B, the files
|
|
|
+generated by the different version may be significantly different. This causes
|
|
|
+all sorts of problems, e.g. coverity/cpp-check issues may appear and disappear:
|
|
|
+in short, it can cause all sorts of general unhappiness.
|
|
|
+To avoid those problems, the Kea team generates the flex/bison
|
|
|
+files on a dedicated machine.
|
|
|
+
|
|
|
+@section parserFlex Flex Detailed
|
|
|
+
|
|
|
+Earlier sections described the lexer in a bit of an over-simplified way. The .ll file
|
|
|
+contains a number of elements in addition to the regular expressions
|
|
|
+and they're not as simple as was described.
|
|
|
+
|
|
|
+The file starts with a number of sections separated by percent (%) signs. Depending
|
|
|
+on which section code is written in, it may be interpreted by flex, copied
|
|
|
+verbatim to the output .cc file, copied to the output .h file or copied to both.
|
|
|
|
|
|
There is an initial section that defines flex options. These are somewhat
|
|
|
-documented, but the docs for it may be a bit cryptic. When developing new
|
|
|
+documented, but the documentation for it may be a bit cryptic. When developing new
|
|
|
parsers, it's best to start by copying whatever we have for DHCPv6 and tweak as
|
|
|
needed.
|
|
|
|
|
|
-Second addition are flex conditions. They're defined with %%x and they define a
|
|
|
+Next comes the flex conditions. They are defined with %%x and they define a
|
|
|
state of the lexer. A good example of a state may be comment. Once the lexer
|
|
|
detects that a comment's beginning, it switches to a certain condition (by calling
|
|
|
BEGIN(COMMENT) for example) and the code then ignores whatever follows
|
|
@@ -159,32 +156,32 @@ BEGIN(COMMENT) for example) and the code then ignores whatever follows
|
|
|
something that is not frequently used and the only use cases for it are the
|
|
|
forementioned comments and file inclusions.
|
|
|
|
|
|
-Second addition are syntactic contexts. Let's assume we have a parser that uses
|
|
|
-"ip-address" regexp that would return IP_ADDRESS token. Whenever we want to
|
|
|
-allow "ip-address", the grammar allows IP_ADDRESS token to appear. When the
|
|
|
-lexer is called, it will match the regexp, will generate the IP_ADDRESS token and
|
|
|
+After this come the syntactic contexts. Let's assume we have a parser that uses an
|
|
|
+"ip-address" regular expression (regexp) that would return the IP_ADDRESS token. Whenever we want to
|
|
|
+allow "ip-address", the grammar allows the IP_ADDRESS token to appear. When the
|
|
|
+lexer is called, it will match the regexp, generate the IP_ADDRESS token and
|
|
|
the parser will carry out its duty. This works fine as long as you have very
|
|
|
specific grammar that defines everything. Sadly, that's not the case in DHCP as
|
|
|
we have hooks. Hook libraries can have parameters that are defined by third
|
|
|
party developers and they can pick whatever parameter names they want, including
|
|
|
-"ip-address". Another example may be Dhcp4 and Dhcp6 configurations defined in a
|
|
|
-single file. When parsed by Dhcp6 server, its grammar has a clause that says
|
|
|
-"Dhcp4" may contain any generic JSON. However, the lexer will likely find the
|
|
|
-"ip-address" string and will say that it's not a part of generic JSON, but a
|
|
|
-dedicated IP_ADDRESS token. The parser would then complain and the whole thing
|
|
|
-would end up in failure. To solve this problem syntactic contexts were introduced.
|
|
|
+"ip-address". Another example could be Dhcp4 and Dhcp6 configurations defined in a
|
|
|
+single file. The grammar defining "Dhcp6" main contain a clause that says
|
|
|
+"Dhcp4" may contain any generic JSON. However, the lexer may find the
|
|
|
+"ip-address" string in the "Dhcp4" configuration and will say that it's not a part of generic JSON, but a
|
|
|
+dedicated IP_ADDRESS token instead. The parser will then complain and the whole thing
|
|
|
+would end up in failure. It was to solve this problem that syntactic contexts were introduced.
|
|
|
They tell the lexer whether input strings have specific or generic meaning.
|
|
|
-For example, when detecting "ip-address" string when parsing host reservation,
|
|
|
-the lexer is expected to report IP_ADDRESS token. However, when parsing generic
|
|
|
-JSON, it should return STRING with a value of "ip-address". The list of all
|
|
|
+For example, when parsing host reservations,
|
|
|
+the lexer is expected to report the IP_ADDRESS token if "ip-address" is detected. However, when parsing generic
|
|
|
+JSON, upon encountering "ip-address" it should return a STRING with a value of "ip-address". The list of all
|
|
|
contexts is enumerated in @ref isc::dhcp::Parser6Context::ParserContext.
|
|
|
|
|
|
-For DHCPv6-specific description of the conflict avoidance, see @ref dhcp6ParserConflicts.
|
|
|
+For a DHCPv6-specific description of the conflict avoidance, see @ref dhcp6ParserConflicts.
|
|
|
|
|
|
-@section parserGrammar Bison grammar
|
|
|
+@section parserGrammar Bison Grammar
|
|
|
|
|
|
Bison has much better documentation than flex. Its latest version seems to be
|
|
|
-available here: https://www.gnu.org/software/bison/manual/ Bison is a LALR(1)
|
|
|
+available here: https://www.gnu.org/software/bison/manual. Bison is a LALR(1)
|
|
|
parser, which essentially means that it is able to parse (separate and analyze)
|
|
|
any text that is described by set of rules. You can see the more formal
|
|
|
description here: https://en.wikipedia.org/wiki/LALR_parser, but the plain
|
|
@@ -192,8 +189,8 @@ English explanation is that you define a set of rules and bison will walk
|
|
|
through input text trying to match the content to those rules. While doing
|
|
|
so, it will be allowed to peek at most one symbol (token) ahead.
|
|
|
|
|
|
-Let's take a closer look at the bison grammar we have for DHCPv6. It is defined
|
|
|
-in src/bin/dhcp6/dhcp6_parser.yy. Here's a simplified excerpt of it:
|
|
|
+As an example, let's take a closer look at the bison grammar we have for DHCPv6. It is defined
|
|
|
+in src/bin/dhcp6/dhcp6_parser.yy. Here's a simplified excerpt:
|
|
|
|
|
|
@code
|
|
|
// This defines a global Dhcp6 object.
|
|
@@ -236,25 +233,25 @@ renew_timer: RENEW_TIMER COLON INTEGER;
|
|
|
@endcode
|
|
|
|
|
|
The code above defines parameters that may appear in the Dhcp6 object
|
|
|
-declaration. One important trick to understand is to get the way to handle
|
|
|
+declaration. One important trick to understand is understand the way to handle
|
|
|
variable number of parameters. In bison it is most convenient to present them as
|
|
|
-recursive lists (global_params in this example) and allow any number of
|
|
|
-global_param instances. This way the grammar is very easily extensible. If one
|
|
|
-needs to add a new global parameter, he or she just needs to add it to the
|
|
|
+recursive lists: in this example, global_params defined in a way that allows any number of
|
|
|
+global_param instances allowing the grammar to be easily extensible. If one
|
|
|
+needs to add a new global parameter, just add it to the
|
|
|
global_param list.
|
|
|
|
|
|
-This type of definitions has several levels, each representing logical
|
|
|
+This type of definition has several levels, each representing logical
|
|
|
structure of the configuration data. We start with global scope, then step
|
|
|
-into Dhcp6 object that has Subnet6 list, which has Subnet6 instances,
|
|
|
-which has pools list and so on. Each of those is represented as a separate
|
|
|
+into a Dhcp6 object that has a Subnet6 list, which in turn has Subnet6 instances,
|
|
|
+each of which has pools list and so on. Each level is represented as a separate
|
|
|
rule.
|
|
|
|
|
|
-The "leaf" rules that don't contain any other rules, must be defined by a
|
|
|
-series of tokens. An example of such a rule is renew_timer above. It is defined
|
|
|
+The "leaf" rules (that don't contain any other rules) must be defined by a
|
|
|
+series of tokens. An example of such a rule is renew_timer, above. It is defined
|
|
|
as a series of 3 tokens: RENEW_TIMER, COLON and INTEGER.
|
|
|
|
|
|
Speaking of integers, it is worth noting that some tokens can have values. Those
|
|
|
-values are defined using %token clause. For example, dhcp6_parser.yy has the
|
|
|
+values are defined using %token clause. For example, dhcp6_parser.yy contains the
|
|
|
following:
|
|
|
|
|
|
@code
|
|
@@ -273,8 +270,8 @@ C++ code to it. Bison will go through the whole input text, match the
|
|
|
rules and will either say the input adhered to the rules (parsing successful)
|
|
|
or not (parsing failed). This may be a useful step when developing new parser,
|
|
|
but it has no practical value. To perform specific actions, bison allows
|
|
|
-injecting C++ code at almost any moment. For example we could augment the
|
|
|
-renew_timer with some extra code:
|
|
|
+the injection of C++ code at almost any poing. For example we could augment the
|
|
|
+parsing of renew_timer with some extra code:
|
|
|
|
|
|
@code
|
|
|
renew_timer: RENEW_TIMER {
|
|
@@ -283,7 +280,7 @@ renew_timer: RENEW_TIMER {
|
|
|
cout << "colon detected!" << endl;
|
|
|
} INTEGER {
|
|
|
uint32_t timer = $3;
|
|
|
- cout << "Got the renew-timer value: " << time << endl;
|
|
|
+ cout << "Got the renew-timer value: " << timer << endl;
|
|
|
ElementPtr prf(new IntElement($3, ctx.loc2pos(@3)));
|
|
|
ctx.stack_.back()->set("renew-timer", prf);
|
|
|
};
|
|
@@ -292,14 +289,15 @@ renew_timer: RENEW_TIMER {
|
|
|
This example showcases several important things. First, the ability to insert
|
|
|
code at almost any step is very useful. It's also a powerful debugging tool.
|
|
|
|
|
|
-Second, some tokens are valueless (e.g. "renew-timer" when represented as
|
|
|
-RENEW_TIMER token has no value), but some have values. In particular, INTEGER
|
|
|
+Second, some tokens are valueless (e.g. "renew-timer" when represented as the
|
|
|
+RENEW_TIMER token has no value), but some have values. In particular, the INTEGER
|
|
|
token has value which can be extracted by $ followed by a number that
|
|
|
represents its order, so $3 means "a value of third token or action
|
|
|
in this rule".
|
|
|
|
|
|
Also, some rules may have values. This is not used often, but there are specific
|
|
|
-cases when it's convenient. Let's take a look at the following excerpt:
|
|
|
+cases when it's convenient. Let's take a look at the following excerpt from
|
|
|
+dhcp6_parser.yy:
|
|
|
|
|
|
@code
|
|
|
ncr_protocol: NCR_PROTOCOL {
|
|
@@ -315,14 +313,16 @@ ncr_protocol_value:
|
|
|
;
|
|
|
@endcode
|
|
|
|
|
|
+(The numbers in brackets at the end of some lines do not appear in the code;
|
|
|
+they are used identify the statements in the following discussion.)
|
|
|
|
|
|
-There's a "ncr-protocol" parameter that accepts one of two values: either tcp or
|
|
|
+The "ncr-protocol" parameter accepts one of two values: either tcp or
|
|
|
udp. To handle such a case, we first enter the NCR_PROTOCOL context to tell the
|
|
|
-lexer that we're in this scope. Lexer will then know that any incoming string of
|
|
|
-text that is either "UDP" or "TCP" should be represented as one of TCP or UDP
|
|
|
-tokens. Parser knows that after NCR_PROTOCOL there will be a colon followed
|
|
|
-by ncr_protocol_value. The rule for ncr_protocol_value says it can be either
|
|
|
-TCP token or UDP token. Let's assume the input text has the following:
|
|
|
+lexer that we're in this scope. The lexer will then know that any incoming string of
|
|
|
+text that is either "UDP" or "TCP" should be represented as one of the TCP or UDP
|
|
|
+tokens. The parser knows that after NCR_PROTOCOL there will be a colon followed
|
|
|
+by an ncr_protocol_value. The rule for ncr_protocol_value says it can be either the
|
|
|
+TCP token or the UDP token. Let's assume the input text is:
|
|
|
@code
|
|
|
"ncr-protocol": "TCP"
|
|
|
@endcode
|
|
@@ -330,23 +330,23 @@ TCP token or UDP token. Let's assume the input text has the following:
|
|
|
Here's how the parser will handle it. First, it will attempt to match the rule
|
|
|
for ncr_protocol. It will discover the first token is NCR_PROTOCOL. As a result,
|
|
|
it will run the code (1), which will tell lexer to parse incoming tokens
|
|
|
-as ncr protocol values. The next token will be COLON. The next one expected
|
|
|
-after that is ncr_protocol_value. Lexer is already switched into NCR_PROTOCOL
|
|
|
-context, so it will recognize "TCP" as TCP token, not as a string of value of "TCP".
|
|
|
-Parser will receive that token and match the line (2). It will create appropriate
|
|
|
-representation that will be used a the rule's value ($$). Finally, parser
|
|
|
-will unroll back to ncr_protocol rule and execute the code in line (3) and (4).
|
|
|
-Line (3) will pick the value set up in line 2 and add it to the stack of
|
|
|
-values. Finally, line (4) will tell the lexer that we finished the NCR protocol
|
|
|
+as ncr protocol values. The next token is expected to be COLON and the one
|
|
|
+after that the ncr_protocol_value. The lexer has already been switched into the NCR_PROTOCOL
|
|
|
+context, so it will recognize "TCP" as TCP token, not as a string with a value of "TCP".
|
|
|
+The parser will receive that token and match the line (2), which creates an appropriate
|
|
|
+representation that will be used as the rule's value ($$). Finally, the parser
|
|
|
+will unroll back to ncr_protocol rule and execute the code in lines (3) and (4).
|
|
|
+Line (3) picks the value set up in line (2) and adds it to the stack of
|
|
|
+values. Finally, line (4) tells the lexer that we finished the NCR protocol
|
|
|
parsing and it can go back to whatever state it was before.
|
|
|
|
|
|
-@section parserBisonStack Generating Element tree in Bison
|
|
|
+@section parserBisonStack Generating the Element Tree in Bison
|
|
|
|
|
|
-Bison parser keeps matching rules until it reaches the end of input file. During
|
|
|
-that process the code needs to build a hierarchy (a tree) of inter-connected
|
|
|
-Element objects that represents parsed text. @ref isc::data::Element has a
|
|
|
+The bison parser keeps matching rules until it reaches the end of input file. During
|
|
|
+that process, the code needs to build a hierarchy (a tree) of inter-connected
|
|
|
+Element objects that represents the parsed text. @ref isc::data::Element has a
|
|
|
complex structure that defines parent-child relation differently depending on
|
|
|
-the type of parent (maps refer to its children differently than lists). This
|
|
|
+the type of parent (ae.g. a map and a list refer to their children in different ways). This
|
|
|
requires the code to be aware of the parent content. In general, every time a
|
|
|
new scope (an opening curly bracket in input text) is encountered, the code
|
|
|
pushes new Element to the stack (see @ref isc::dhcp::Parser6Context::stack_)
|
|
@@ -357,20 +357,20 @@ parsing preferred-lifetime, the code does the following:
|
|
|
|
|
|
@code
|
|
|
preferred_lifetime: PREFERRED_LIFETIME COLON INTEGER {
|
|
|
- ElementPtr prf(new IntElement($3, ctx.loc2pos(@3))); (1)
|
|
|
- ctx.stack_.back()->set("preferred-lifetime", prf); (2)
|
|
|
+ ElementPtr prf(new IntElement($3, ctx.loc2pos(@3)));
|
|
|
+ ctx.stack_.back()->set("preferred-lifetime", prf);
|
|
|
}
|
|
|
@endcode
|
|
|
|
|
|
The first line creates an instance of IntElement with a value of the token. The
|
|
|
second line adds it to the current map (current = the last on the stack). This
|
|
|
approach has a very nice property of being generic. This rule can be referenced
|
|
|
-from global and subnet scope (and possibly other scopes as well) and the code
|
|
|
+from both global and subnet scope (and possibly other scopes as well) and the code
|
|
|
will add the IntElement object to whatever is last on the stack, be it global,
|
|
|
subnet or perhaps even something else (maybe one day we will allow preferred
|
|
|
lifetime to be defined on a per pool or per host basis?).
|
|
|
|
|
|
-@section parserSubgrammar Parsing partial configuration
|
|
|
+@section parserSubgrammar Parsing a Partial Configuration
|
|
|
|
|
|
All the explanations so far assumed that we're operating in a default case of
|
|
|
receiving the configuration as a whole. That is the case during startup and
|
|
@@ -378,45 +378,45 @@ reconfiguration. However, both DHCPv4 and DHCPv6 support certain cases when the
|
|
|
input text is not the whole configuration, but rather certain parts of it. There
|
|
|
are several examples of such cases. The most common are unit-tests. They
|
|
|
typically don't have the outermost { } or Dhcp6 object, but simply define
|
|
|
-whatever parameters are being tested. Second, we have command channel that will
|
|
|
-in the near future contain parts of the configuration, depending on the
|
|
|
-command. For example, add-reservation will contain host reservation only.
|
|
|
+whatever parameters are being tested. Second, we have the command channel that will,
|
|
|
+in the near future, contain parts of the configuration, depending on the
|
|
|
+command. For example, "add-reservation" will contain a host reservation only.
|
|
|
|
|
|
Bison by default does not support multiple start rules, but there's a trick
|
|
|
-that can provide such capability. The trick assumes that the starting
|
|
|
-rule may allow one of artificial tokens that represent the scope that is
|
|
|
-expected. For example, when called from add-reservation command, the
|
|
|
+that can provide such a capability. The trick assumes that the starting
|
|
|
+rule may allow one of the artificial tokens that represent the scope
|
|
|
+expected. For example, when called from the "add-reservation" command, the
|
|
|
artificial token will be SUB_RESERVATION and it will trigger the parser
|
|
|
-to bypass the global { }, Dhcp6 and jump immediately to sub_reservation.
|
|
|
+to bypass the global braces { and } and the "Dhcp6" token and jump immediately to the sub_reservation.
|
|
|
|
|
|
-This trick is also implemented in the lexer. There's a flag called start_token_flag.
|
|
|
-When initially set to true, it will cause the lexer to emit an artificial
|
|
|
+This trick is also implemented in the lexer. A flag called start_token_flag,
|
|
|
+when initially set to true, will cause the lexer to emit an artificial
|
|
|
token once, before parsing any input whatsoever.
|
|
|
|
|
|
This optional feature can be skipped altogether if you don't plan to parse parts
|
|
|
of the configuration.
|
|
|
|
|
|
-@section parserBisonExtend Extending grammar
|
|
|
+@section parserBisonExtend Extending the Grammar
|
|
|
|
|
|
Adding new parameters to existing parsers is very easy once you get hold of the
|
|
|
concept of what the grammar rules represent. The first step is to understand
|
|
|
where the parameter is to be allowed. Typically a new parameter is allowed
|
|
|
-in one scope and only over time it is added in other scopes. Recently a support
|
|
|
-for 4o6-interface-id parameter has been added. That's parameter that can
|
|
|
+in one scope and only over time is it added to other scopes. Recently support
|
|
|
+for a 4o6-interface-id parameter has been added. That is a parameter that can
|
|
|
be defined in a subnet and takes a string argument. You can see the actual
|
|
|
change conducted in this commit:
|
|
|
(https://github.com/isc-projects/kea/commit/9fccdbf54c4611dc10111ad8ff96d36cad59e1d6).
|
|
|
|
|
|
-Here's the complete set of necessary changes.
|
|
|
+Here's the complete set of changes that were necessary.
|
|
|
|
|
|
1. Define a new token in dhcp6_parser.yy:
|
|
|
@code
|
|
|
SUBNET_4O6_INTERFACE_ID "4o6-interface-id"
|
|
|
@endcode
|
|
|
- This defines a token called SUBNET_4O6_INTERFACE_ID that, when needed to
|
|
|
+ This defines a token called SUBNET_4O6_INTERFACE_ID that, when it needs to
|
|
|
be printed, e.g. in an error message, will be represented as "4o6-interface-id".
|
|
|
|
|
|
-2. Tell lexer how to recognize the new parameter:
|
|
|
+2. Tell the lexer how to recognize the new parameter:
|
|
|
@code
|
|
|
\"4o6-interface-id\" {
|
|
|
switch(driver.ctx_) {
|
|
@@ -427,13 +427,13 @@ Here's the complete set of necessary changes.
|
|
|
}
|
|
|
}
|
|
|
@endcode
|
|
|
- It tells the parser that when in Subnet4 context, incoming "4o6-interface-id" string
|
|
|
- should be represented as SUBNET_4O6_INTERFACE_ID token. In any other context,
|
|
|
+ It tells the parser that when in Subnet4 context, an incoming "4o6-interface-id" string
|
|
|
+ should be represented as the SUBNET_4O6_INTERFACE_ID token. In any other context,
|
|
|
it should be represented as a string.
|
|
|
|
|
|
3. Add the rule that will define the value. A user is expected to add something like
|
|
|
@code
|
|
|
- "4o6-interface-id": "whatevah"
|
|
|
+ "4o6-interface-id": "whatever"
|
|
|
@endcode
|
|
|
The rule to match this and similar statements looks as follows:
|
|
|
@code
|
|
@@ -452,7 +452,7 @@ Here's the complete set of necessary changes.
|
|
|
integer or float.
|
|
|
|
|
|
4. Finally, extend the existing subnet4_param that defines all allowed parameters
|
|
|
- in Subnet4 scope to also cover our new parameter (the new line marked with *):
|
|
|
+ in the Subnet4 scope to also cover our new parameter (the new line marked with *):
|
|
|
@code
|
|
|
subnet4_param: valid_lifetime
|
|
|
| renew_timer
|
|
@@ -477,8 +477,8 @@ Here's the complete set of necessary changes.
|
|
|
;
|
|
|
@endcode
|
|
|
|
|
|
-5. Regenerate flex/bison files by typing make parser.
|
|
|
+5. Regenerate the flex/bison files by typing "make parser".
|
|
|
|
|
|
-6. Run unit-tests that you wrote before touch any bison stuff. You did write them
|
|
|
+6. Run the unit-tests that you wrote before you touched any of the bison stuff. You did write them
|
|
|
in advance, right?
|
|
|
*/
|