|
@@ -9,14 +9,14 @@
|
|
|
|
|
|
@section parserIntro Parser background
|
|
|
|
|
|
-Kea's format of choice is JSON, which is used in configuration files, in the
|
|
|
-command channel and also when communicating between DHCP servers and DHCP-DDNS
|
|
|
-component. It is almost certain that it will be used as the syntax for any
|
|
|
-upcoming features.
|
|
|
+Kea's data format of choice is JSON (https://tools.ietf.org/html/rfc7159), which
|
|
|
+is used in configuration files, in the command channel and also when
|
|
|
+communicating between DHCP servers and DHCP-DDNS component. It is almost certain
|
|
|
+it will be used as the data format for any new features.
|
|
|
|
|
|
Historically, Kea used @ref isc::data::Element::fromJSON and @ref
|
|
|
isc::data::Element::fromJSONFile methods to parse received data that is expected
|
|
|
-to be in JSON syntax. This in-house parser was developed back in early BIND10
|
|
|
+to be in JSON syntax. This in-house parser was developed back in the early BIND10
|
|
|
days. Its two main advantages were that it didn't have any external dependencies
|
|
|
and that it was already available in the source tree when the Kea project
|
|
|
started. On the other hand, it was very difficult to modify (several attempts to
|
|
@@ -49,9 +49,9 @@ and here: http://kea.isc.org/wiki/SimpleParser.
|
|
|
To solve the issue of phase 1 mentioned earlier, a new parser has been developed
|
|
|
that is based on flex and bison tools. The following text uses DHCPv6 as an
|
|
|
example, but the same principle applies to DHCPv4 and D2 and CA will likely to
|
|
|
-follow. The new parser consists of two core elements (the following description
|
|
|
-is slightly oversimplified to convey the intent, more detailed description
|
|
|
-is available in the following sections):
|
|
|
+follow. The new parser consists of two core elements with a wrapper around them
|
|
|
+(the following description is slightly oversimplified to convey the intent, more
|
|
|
+detailed description is available in the following sections):
|
|
|
|
|
|
-# Flex lexer (src/bin/dhcp6/dhcp6_lexer.ll) that is essentially a set of
|
|
|
regular expressions with C++ code that creates new tokens that represent whatever
|
|
@@ -87,20 +87,23 @@ is available in the following sections):
|
|
|
(a token with a value of 100), RCURLY_BRACKET, RCURLY_BRACKET, END
|
|
|
|
|
|
-# Parser context. As there is some information that needs to be passed between
|
|
|
- parser and lexer, @ref isc::dhcp::Parser6Context is a convenient to wrapper
|
|
|
+ parser and lexer, @ref isc::dhcp::Parser6Context is a convenience wrapper
|
|
|
around those two bundled together. It also works as a nice encapsulation,
|
|
|
hiding all the flex/bison details underneath.
|
|
|
|
|
|
@section parserBuild Building flex/bison code
|
|
|
|
|
|
-The only input file used by flex is the .ll file. The only input file used
|
|
|
-by bison is the .yy file. When processed, those two tools will generate
|
|
|
-a number of .hh and .cc files. The major ones are names the same as their
|
|
|
-.ll and .yy counterparts (e.g. dhcp6_lexer.cc, dhcp6_parser.cc and dhcp6_parser.h),
|
|
|
-but there's a number of additional files created: location.hh, position.hh
|
|
|
-and stack.hh. Those are internal bison headers that are needed. To avoid every
|
|
|
-user to have flex and bison installed, we chose to generate the files and
|
|
|
-add them to the Kea repository. To generate those files, do the following:
|
|
|
+The only input file used by flex is the .ll file. The only input file used by
|
|
|
+bison is the .yy file. When making changes to the lexer or parser, only those
|
|
|
+two files are edited. When processed, those two tools will generate a number of
|
|
|
+.hh and .cc files. The major ones are named the same as their .ll and .yy
|
|
|
+counterparts (e.g. dhcp6_lexer.cc, dhcp6_parser.cc and dhcp6_parser.h), but
|
|
|
+there's a number of additional files created: location.hh, position.hh and
|
|
|
+stack.hh. Those are internal bison headers that are needed for compilation.
|
|
|
+
|
|
|
+To avoid every user to have flex and bison installed, we chose to generate the
|
|
|
+files and add them to the Kea repository. To generate those files, do the
|
|
|
+following:
|
|
|
|
|
|
@code
|
|
|
./configure --enable-generate-parser
|
|
@@ -120,7 +123,9 @@ generated may be different and cause unnecessarily large diffs, may cause
|
|
|
coverity/cpp-check issues appear and disappear and cause general unhappiness.
|
|
|
To avoid those problems, we will introduce a requirement to generate flex/bison
|
|
|
files on one dedicated machine. This machine will likely be docs. Currently Ops
|
|
|
-is working on installing the necessary versions of flex/bison required
|
|
|
+is working on installing the necessary versions of flex/bison required, but
|
|
|
+for the time being we can use the versions installed in Francis' home directory
|
|
|
+(export PATH=/home/fdupont/bin:$PATH).
|
|
|
|
|
|
Note: the above applies only to the code being merged on master. It is probably
|
|
|
ok to generate the files on your development branch with whatever version you
|
|
@@ -145,10 +150,10 @@ documented, but the docs for it may be a bit cryptic. When developing new
|
|
|
parsers, it's best to start by copying whatever we have for DHCPv6 and tweak as
|
|
|
needed.
|
|
|
|
|
|
-Second addition are flex conditions. They're defined with %x and they define a
|
|
|
+Second addition are flex conditions. They're defined with %%x and they define a
|
|
|
state of the lexer. A good example of a state may be comment. Once the lexer
|
|
|
-detects that a comment has started, it switches to certain condition (by calling
|
|
|
-BEGIN(COMMENT) for example) and the code should ignore whatever follows
|
|
|
+detects that a comment's beginning, it switches to a certain condition (by calling
|
|
|
+BEGIN(COMMENT) for example) and the code then ignores whatever follows
|
|
|
(especially strings that look like valid tokens) until the comment is closed
|
|
|
(when it returns to the default condition by calling BEGIN(INITIAL)). This is
|
|
|
something that is not frequently used and the only use cases for it are the
|
|
@@ -157,7 +162,7 @@ forementioned comments and file inclusions.
|
|
|
Second addition are parser contexts. Let's assume we have a parser that uses
|
|
|
"ip-address" regexp that would return IP_ADDRESS token. Whenever we want to
|
|
|
allow "ip-address", the grammar allows IP_ADDRESS token to appear. When the
|
|
|
-lexer is called, it will match the regexp, will generate IP_ADDRESS token and
|
|
|
+lexer is called, it will match the regexp, will generate the IP_ADDRESS token and
|
|
|
the parser will carry out its duty. This works fine as long as you have very
|
|
|
specific grammar that defines everything. Sadly, that's not the case in DHCP as
|
|
|
we have hooks. Hook libraries can have parameters that are defined by third
|
|
@@ -193,7 +198,7 @@ in src/bin/dhcp6/dhcp6_parser.yy. Here's a simplified excerpt of it:
|
|
|
dhcp6_object: DHCP6 COLON LCURLY_BRACKET global_params RCURLY_BRACKET;
|
|
|
|
|
|
// This defines all parameters that may appear in the Dhcp6 object.
|
|
|
-// It can either contain a global_param (defined below) or a
|
|
|
+// It can either contain a global_param (defined below) or a
|
|
|
// global_params list, followed by a comma followed by a global_param.
|
|
|
// Note this definition is recursive and can expand to a single
|
|
|
// instance of global_param or multiple instances separated by commas.
|
|
@@ -201,7 +206,7 @@ dhcp6_object: DHCP6 COLON LCURLY_BRACKET global_params RCURLY_BRACKET;
|
|
|
global_params: global_param
|
|
|
| global_params COMMA global_param
|
|
|
;
|
|
|
-
|
|
|
+
|
|
|
// These are the parameters that are allowed in the top-level for
|
|
|
// Dhcp6.
|
|
|
global_param: preferred_lifetime
|
|
@@ -222,9 +227,9 @@ global_param: preferred_lifetime
|
|
|
| server_id
|
|
|
| dhcp4o6_port
|
|
|
;
|
|
|
-
|
|
|
+
|
|
|
renew_timer: RENEW_TIMER COLON INTEGER;
|
|
|
-
|
|
|
+
|
|
|
// Many other definitions follow.
|
|
|
@endcode
|
|
|
|
|
@@ -244,7 +249,7 @@ rule.
|
|
|
|
|
|
The "leaf" rules that don't contain any other rules, must be defined by a
|
|
|
series of tokens. An example of such a rule is renew_timer above. It is defined
|
|
|
-as a series of 3 tokens: RENEW_TIMER, COLON and INTEGER.
|
|
|
+as a series of 3 tokens: RENEW_TIMER, COLON and INTEGER.
|
|
|
|
|
|
Speaking of integers, it is worth noting that some tokens can have values. Those
|
|
|
values are defined using %token clause. For example, dhcp6_parser.yy has the
|
|
@@ -272,7 +277,7 @@ renew_timer with some extra code:
|
|
|
@code
|
|
|
renew_timer: RENEW_TIMER {
|
|
|
cout << "renew-timer token detected, so far so good" << endl;
|
|
|
-} COLON {
|
|
|
+} COLON {
|
|
|
cout << "colon detected!" << endl;
|
|
|
} INTEGER {
|
|
|
uint32_t timer = $3;
|
|
@@ -298,11 +303,11 @@ ncr_protocol: NCR_PROTOCOL {
|
|
|
ctx.enter(ctx.NCR_PROTOCOL); (1)
|
|
|
} COLON ncr_protocol_value {
|
|
|
ctx.stack_.back()->set("ncr-protocol", $4); (3)
|
|
|
- ctx.leave();
|
|
|
+ ctx.leave(); (4)
|
|
|
};
|
|
|
|
|
|
ncr_protocol_value:
|
|
|
- UDP { $$ = ElementPtr(new StringElement("UDP", ctx.loc2pos(@1))); }
|
|
|
+ UDP { $$ = ElementPtr(new StringElement("UDP", ctx.loc2pos(@1))); }
|
|
|
| TCP { $$ = ElementPtr(new StringElement("TCP", ctx.loc2pos(@1))); } (2)
|
|
|
;
|
|
|
@endcode
|
|
@@ -358,8 +363,8 @@ The first line creates an instance of IntElement with a value of the token. The
|
|
|
second line adds it to the current map (current = the last on the stack). This
|
|
|
approach has a very nice property of being generic. This rule can be referenced
|
|
|
from global and subnet scope (and possibly other scopes as well) and the code
|
|
|
-will add the IntElement object to whatever is last on the stack, be it
|
|
|
-global, subnet or perhaps even something else (maybe we will allow preferred
|
|
|
+will add the IntElement object to whatever is last on the stack, be it global,
|
|
|
+subnet or perhaps even something else (maybe one day we will allow preferred
|
|
|
lifetime to be defined on a per pool or per host basis?).
|
|
|
|
|
|
@section parserSubgrammar Parsing partial grammar
|
|
@@ -385,6 +390,9 @@ This trick is also implemented in the lexer. There's a flag called start_token_f
|
|
|
When initially set to true, it will cause the lexer to emit an artificial
|
|
|
token once, before parsing any input whatsoever.
|
|
|
|
|
|
+This optional feature can be skipped altogether if you don't plan to parse parts
|
|
|
+of the configuration.
|
|
|
+
|
|
|
@section parserBisonExtend Extending grammar
|
|
|
|
|
|
Adding new parameters to existing parsers is very easy once you get hold of the
|
|
@@ -402,7 +410,7 @@ Here's the complete set of necessary changes.
|
|
|
@code
|
|
|
SUBNET_4O6_INTERFACE_ID "4o6-interface-id"
|
|
|
@endcode
|
|
|
- This defines a token called SUBNET_4O6_INTERFACE_ID that, when needed to
|
|
|
+ This defines a token called SUBNET_4O6_INTERFACE_ID that, when needed to
|
|
|
be printed, will be represented as "4o6-interface-id".
|
|
|
|
|
|
2. Tell lexer how to recognize the new parameter:
|
|
@@ -439,7 +447,7 @@ Here's the complete set of necessary changes.
|
|
|
weird that happens to match our reserved keywords. Therefore we switch to
|
|
|
no keyword context. This tells the lexer to interpret everything as string,
|
|
|
integer or float.
|
|
|
-
|
|
|
+
|
|
|
4. Finally, extend the existing subnet4_param that defines all allowed parameters
|
|
|
in Subnet4 scope to also cover our new parameter (the new line marked with *):
|
|
|
@code
|