eval.dox 7.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169
  1. // Copyright (C) 2015-2016 Internet Systems Consortium, Inc. ("ISC")
  2. //
  3. // This Source Code Form is subject to the terms of the Mozilla Public
  4. // License, v. 2.0. If a copy of the MPL was not distributed with this
  5. // file, You can obtain one at http://mozilla.org/MPL/2.0/.
  6. /**
  7. @page dhcpEval libeval - Expression evaluation and client classification
  8. @section dhcpEvalIntroduction Introduction
  9. The core of the libeval library is a parser that is able to parse an
  10. expression (e.g. option[123].text == 'APC'). This is currently used for
  11. client classification, but in the future may be also used for other
  12. applications.
  13. The external interface to the library is the @ref isc::eval::EvalContext
  14. class. Once instantiated, it offers a major method:
  15. @ref isc::eval::EvalContext::parseString, which parses the specified
  16. string. Once the expression is parsed, it is converted to a collection of
  17. tokens that are stored in Reverse Polish Notation in
  18. EvalContext::expression.
  19. Internally, the parser code is generated by flex and bison. These two
  20. tools convert lexer.ll and parser.yy files into a number of .cc and .hh files.
  21. To avoid a build of Kea depending on the presence of flex and bison, the
  22. result of the generation is checked into the github repository and is
  23. distributed in the tarballs.
  24. @section dhcpEvalLexer Lexer generation using flex
  25. Flex is used to generate the lexer, a piece of code that converts input
  26. data into a series of tokens. It contains a small number of directives,
  27. but the majority of the code consists of the definitions of tokens. These
  28. definitions are regular expressions that define various tokens, e.g. strings,
  29. numbers, parentheses, etc. Once the expression is matched, the associated
  30. action is executed. In the majority of the cases a generator method from
  31. @ref isc::eval::EvalParser is called, which returns returns a newly created
  32. bison token. The purpose of the lexer is to generate a stream
  33. of tokens that are consumed by the parser.
  34. lexer.cc and lexer.hh must not be edited. If there is a need
  35. to introduce changes, lexer.ll must be updated and the .cc and .hh files
  36. regenerated.
  37. @section dhcpEvalParser Parser generation using bison
  38. Bison is used to generate the parser, a piece of code that consumes a
  39. stream of tokens and attempts to match it against a defined grammar.
  40. The bison parser is created from parser.yy. It contains
  41. a number of directives, but the two most important sections are:
  42. a list of tokens (for each token defined here, bison will generate the
  43. make_NAMEOFTOKEN method in the @ref isc::eval::EvalParser class) and
  44. the grammar. The Grammar is a tree like structure with possible loops.
  45. Here is an over-simplified version of the grammar:
  46. @code
  47. 01. %start expression;
  48. 02.
  49. 03. expression : token EQUAL token
  50. 04. | token
  51. 05. ;
  52. 06.
  53. 07. token : STRING
  54. 08. {
  55. 09. TokenPtr str(new TokenString($1));
  56. 10. ctx.expression.push_back(str);
  57. 11. }
  58. 12. | HEXSTRING
  59. 13. {
  60. 14. TokenPtr hex(new TokenHexString($1));
  61. 15. ctx.expression.push_back(hex);
  62. 16. }
  63. 17. | OPTION '[' INTEGER ']' DOT TEXT
  64. 18. {
  65. 19. TokenPtr opt(new TokenOption($3, TokenOption::TEXTUAL));
  66. 20. ctx.expression.push_back(opt);
  67. 21. }
  68. 22. | OPTION '[' INTEGER ']' DOT HEX
  69. 23. {
  70. 24. TokenPtr opt(new TokenOption($3, TokenOption::HEXADECIMAL));
  71. 25. ctx.expression.push_back(opt);
  72. 26. }
  73. 27. ;
  74. @endcode
  75. This code determines that the grammar starts from expression (line 1).
  76. The actual definition of expression (lines 3-5) may either be a
  77. single token or an expression "token == token" (EQUAL has been defined as
  78. "==" elsewhere). Token is further
  79. defined in lines 7-22: it may either be a string (lines 7-11),
  80. a hex string (lines 12-16), option in the textual format (lines 17-21)
  81. or option in a hexadecimal format (lines 22-26).
  82. When the actual case is determined, the respective C++ action
  83. is executed. For example, if the token is a string, the TokenString class is
  84. instantiated with the appropriate value and put onto the expression vector.
  85. @section dhcpEvalMakefile Generating parser files
  86. In the general case, we want to avoid generating parser files, so an
  87. average user interested in just compiling Kea would not need flex or
  88. bison. Therefore the generated files are already included in the
  89. git repository and will be included in the tarball releases.
  90. However, there will be cases when one of the developers would want
  91. to tweak the lexer.ll and parser.yy files and then regenerate
  92. the code. For this purpose, two makefile targets are defined:
  93. @code
  94. make parser
  95. @endcode
  96. will generate the parsers and
  97. @code
  98. make parser-clean
  99. @endcode
  100. will remove the files. Generated files removal was also hooked
  101. into the maintainer-clean target.
  102. @section dhcpEvalConfigure Configure options
  103. Since the flex/bison tools are not necessary for a regular compilation,
  104. checks are conducted during the configure script, but the lack of flex or
  105. bison tools does not stop the process. There is a flag
  106. (--enable-generate-parser) that tells configure script that the
  107. parser will be generated. With this flag, the checks for flex/bison
  108. are mandatory. If either tool is missing or at too early a version, the
  109. configure process will terminate with an error.
  110. @section dhcpEvalToken Supported tokens
  111. There are a number of tokens implemented. Each token is derived from
  112. isc::eval::Token class and represents a certain expression primitive.
  113. Currently supported tokens are:
  114. - isc::dhcp::TokenString -- represents a constant string, e.g. "MSFT";
  115. - isc::dhcp::TokenHexString -- represents a constant string, encoded as
  116. hex string, e.g. 0x666f6f which is actually "foo";
  117. - isc::dhcp::TokenIpAddress -- represents a constant IP address, encoded as
  118. a 4 or 16 byte binary string, e.g., 10.0.0.1 is 0x10000001.
  119. - isc::dhcp::TokenOption -- represents an option in a packet, e.g.
  120. option[123].text;
  121. - isc::dhcp::TokenRelay4Option -- represents a sub-option inserted by the
  122. DHCPv4 relay, e.g. relay[123].text or relay[123].hex
  123. - isc::dhcp::TokenRelay6Option -- represents a sub-option inserted by
  124. a DHCPv6 relay
  125. - isc::dhcp::TokenPkt -- represents a DHCP packet meta data (incoming
  126. interface name, source/remote or destination/local IP address, length).
  127. - isc::dhcp::TokenPkt4 -- represents a DHCPv4 packet field.
  128. - isc::dhcp::TokenPkt6 -- represents a DHCPv6 packet field (message type
  129. or transaction id).
  130. - isc::dhcp::TokenRelay6Field -- represents a DHCPv6 relay information field.
  131. - isc::dhcp::TokenEqual -- represents the equal (==) operator;
  132. - isc::dhcp::TokenSubstring -- represents the substring(text, start, length) operator;
  133. - isc::dhcp::TokenConcat -- represents the concat operator which
  134. concatenate two other tokens.
  135. - isc::dhcp::TokenNot -- the logical not operator.
  136. - isc::dhcp::TokenAnd -- the logical and (strict) operator.
  137. - isc::dhcp::TokenOr -- the logical or (strict) operator (strict means
  138. it always evaluates its operands).
  139. - isc::dhcp::TokenVendor -- represents vendor information option's existence,
  140. enterprise-id field and possible sub-options. (e.g. vendor[1234].exists,
  141. vendor[*].enterprise-id, vendor[1234].option[1].exists, vendor[1234].option[1].hex)
  142. - isc::dhcp::TokenVendorClass -- represents vendor information option's existence,
  143. enterprise-id and included data chunks. (e.g. vendor-class[1234].exists,
  144. vendor-class[*].enterprise-id, vendor-class[*].data[3])
  145. More operators are expected to be implemented in upcoming releases.
  146. */