DHCP Performance Guide

DHCP Performance Guide 2012 Internet Systems Consortium, Inc. ("ISC") Tomasz Mrugalski Marcin Siodelski BIND 10 is a framework that features Domain Name System (DNS) suite and Dynamic Host Configuration Protocol (DHCP) servers with development managed by Internet Systems Consortium (ISC). This document describes various aspects of DHCP performance, measurements and tuning. It covers BIND 10 DHCP (codename Kea), existing ISC DHCP4 software, perfdhcp (a DHCP performance measurement tool) and other related topics. This is a companion document for BIND 10 version &__VERSION__;. Preface

Acknowledgements ISC would like to acknowledge generous support for BIND 10 development of DHCPv4 and DHCPv6 components provided by Comcast.

Introduction This document is in its early stages of development. It is expected to grow significantly in a near future. It will cover topics like database backend perfomance measurements, pros an cons of various optimization techniques and tools. ISC DHCP 4.x TODO: Write something about ISC DHCP4 here. Kea

Backend performance evaluation Kea will support several different database backends, using both popular databases (like MySQL or SQLite) and custom-developed solutions (like in-memory database). BIND 10 source code features set of performance microbenchmarks. These are small tools written in C/C++ that simulate expected DHCP server behaviour and evaluate the performance of considered databases. As implemented benchmarks are not really simulating DHCP operation, but rather use set of primitives that can be used by a real server, they are called micro-benchmarks. Although there are many operations and data types that server could store in a database, the most frequently used data type is lease information. Although lease information for IPv4 and IPv6 differs slightly, it is expected that the performance differences will be minimal between IPv4 and IPv6 lease operations. Therefore each test uses lease4 table for performance measurements. All benchmarks are implemented as single threaded applications that take advantage of a single database connection. Those benchmarks are stored in tests/tools/dhcp-ubench directory. This directory contains simplified prototypes for various DB back-ends that are planned or considered as a backend engine for BIND10 DHCP. Athough trivial now, they are expected to evolve into useful tools that will allow users to measure performance in their specific environment. Currently the following benchmarks are implemented: in memory+flat file SQLite MySQL As they require additional (sometimes heavy) dependencies, they are not built by default. Actually, their build system is completely separated. It will be eventually merged with the main BIND10 makefile system, but that is a low priority for now. All benchmarks will follow the same pattern: prepare operation (connect to a database, create a file etc.) Measure timestamp 0 Commit new lease4 (repeated X times) Measure timestamp 1 Search for random lease4 (repeated X times) Measure timestamp 2 Update existing lease4 (repeated X times) Measure timestamp 3 Delete existing lease4 (repeated X times) Measure timestamp 4 Print out statistics, based on X and measured timestamps. Although this approach does not attempt to simulate actual DHCP server operation that has mix of all steps intervening, it answers the questions about basic database strenghts and weak points. In particular it can show what is the impact of specific DB optimizations, like changing engine, optimizing for writes/reads etc. The framework attempts to do the same amount of operations for every backend thus allowing fair complarison between them.

MySQL backend MySQL backend requires MySQL client development libraries. It uses mysql_config tool (that works similar to pkg-config) to discover required compilation and linking options. To install required packages on Ubuntu, use the following command: $ sudo apt-get install mysql-client mysql-server libmysqlclient-dev Make sure that MySQL server is running. Make sure that you have your setup configured so there is a user that is able to modify used database. Before running tests, you need to initialize your database. You can use mysql.schema script for that purpose. WARNING: It will drop existing Kea database. Do not run this on your production server. Assuming your MySQL user is kea, you can initialize your test database by: $ mysql -u kea -p < mysql.schema After database is initialized, you are ready to run the test: $ ./mysql_ubench or $ ./mysql_ubench > results->mysql.txt Redirecting output to a file is important, because for each operation there is a single character printed to show progress. If you have a slow terminal, this may considerably affect test perfromance. On the other hand, printing something after each operation is required, as poor DB setting may slow down operations to around 20 per second. Observant user is expected to note that initial dots are printed too slowly and abort the test. Currently all default parameters are hardcoded. Default values can be overwritten using command line switches. Although all benchmarks take the same list of parameters, some of them are specific to a given backend type. To get a list of supported parameters, run your benchmark with -h option: $ ./mysql_ubench -h This is a benchmark designed to measure expected performance of several backends. This particular version identifies itself as following: MySQL client version is 5.5.24 Possible command-line parameters: -h - help (you are reading this) -m hostname - specifies MySQL server to connect (MySQL backend only) -u username - specifies MySQL user name (MySQL backend only) -p password - specifies MySQL passwod (MySQL backend only) -f name - database or filename (MySQL, SQLite and memfile) -n integer - number of test repetitions (MySQL, SQLite and memfile) -s yes|no - synchronous/asynchronous operation (MySQL, SQLite and memfile) -v yes|no - verbose mode (MySQL, SQLite and memfile) -c yes|no - should compiled statements be used (MySQL only)

MySQL tweaks One parameter that has huge impact on performance is a a backend engine. You can get a list of engines of your MySQL implementation by using > show engines; in your mysql client. Two notable engines are MyISAM and InnoDB. mysql_ubench will use MyISAM for synchronous mode and InnoDB for asynchronous.

SQLite-ubench SQLite backend requires both sqlite3 development and run-time package. Their names may vary from system to system, but on Ubuntu 12.04 they are called sqlite3 libsqlite3-dev. To install them, use the following command: > sudo apt-get install sqlite3 libsqlite3-dev Before running the test the database has to be created. Use the following command for that: > cat sqlite.schema | sqlite3 sqlite.db A new database called sqlite.db will be created. That is the default name used by sqlite_ubench test. If you prefer other name, make sure you update sqlite_ubench.cc accordingly. Once the database is created, you can run tests: > ./sqlite_ubench or > ./sqlite_ubench > results-sqlite.txt

SQLite tweaks To modify default sqlite_ubench parameters, command line switches can be used. Currently supported parameters are (default values specified in brackets): -f filename - name of the database file ("sqlite.db") -n num - number of iterations (100) -s yes|no - should the operations be performend in synchronous (yes) or asynchronous (no) manner (yes) -v yes|no - verbose mode. Should the test print out progress? (yes) -c yes|no - compiled statements. Should the SQL statements be precompiled? SQLite can run in asynchronous or synchronous mode. This mode can be controlled by using sync parameter. It is set using (PRAGMA synchronous = ON or OFF). Another tweakable feature is journal mode. It can be turned to several modes of operation. Its value can be modified in SQLite_uBenchmark::connect(). See http://www.sqlite.org/pragma.html#pragma_journal_mode for detailed explanantion.

memfile-ubench Memfile backend is custom developed prototype backend that somewhat mimics operation of ISC DHCP4. It uses in-memory storage using standard C++ and boost mechanisms (std::map and boost::shared_ptr<>). All database changes are also written to a lease file. That file is strictly write-only. This approach takes advantage of the fact that simple append is faster than edition with potential whole file relocation.

memfile tweaks To modify default memfile_ubench parameters, command line switches can be used. Currently supported parameters are (default values specified in brackets): -f filename - name of the database file ("dhcpd.leases") -n num - number of iterations (100) -s yes|no - should the operations be performend in synchronous (yes) or asynchronous (no) manner (yes) -v yes|no - verbose mode. Should the test print out progress? (yes) memfile can run in asynchronous or synchronous mode. This mode can be controlled by using sync parameter. It uses fflush() and fsync() in synchronous mode to make sure that data is not buffered and physically stored on disk.

Performance measurements This section contains sample results for backend performance measurements, taken using microbenchmarks. Tests were conducted on reasonably powerful machine: CPU: Quad-core Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (8 logical cores) HDD: 1,5TB Seagate Barracuda ST31500341AS 7200rpm (used only one of them), ext4 partition OS: Ubuntu 12.04, running kernel 3.2.0-26-generic SMP x86_64 compiler: g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 MySQL version: 5.5.24 SQLite version: 3.7.9sourceid version is 2011-11-01 00:52:41 c7c6050ef060877ebe77b41d959e9df13f8c9b5e Benchmarks were run in two series: synchronous and asynchronous. As those modes offer radically different performances, synchronous mode was conducted for 1000 (one thousand) repetitions and asynchronous mode was conducted for 100000 (hundred thousand) repetitions. Synchronous results Backend Operations Create Search Update Delete Average MySQL 1000 31.603978s 0.116612s 27.964191s 27.695209s 21.844998s SQLite 1000 61.421356s 0.033283s 59.476638s 56.034150s 44.241357s memfile 1000 41.711886s 0.000724s 42.267578s 42.169679s 31.537467s

Following parameters were measured for asynchronous mode. MySQL and SQLite were run with 100 thousand repetitions. Memfile was run for 1 million repetitions due to much larger performance. Asynchronous results Backend Operations Create [s] Search [s] Update [s] Delete [s] Average [s] MySQL 100000 10.584842s 10.386402s 10.062384s 8.890197s 9.980956s SQLite 100000 3.710356s 3.159129s 2.865354s 2.439406s 3.043561s memfile 1000000 (sic!) 6.084131s 0.862667s 6.018585s 5.146704s 4.528022s

Presented performance results can be computed into operations per second metrics. It should be noted that due to large differences between various operations (sometime over 3 orders of magnitude), it is difficult to create a simple, readable chart with that data. Estimated performance Backend Create [oper/s] Search [oper/s] Update [oper/s] Delete [oper/s] Average [oper/s] MySQL (async) 9447.47 9627.97 9938.00 11248.34 10065.45 SQLite (async) 26951.59 31654.29 34899.70 40993.59 33624.79 memfile (async) 164362.01 1159195.84 166152.01 194299.11 421002.24 MySQL (sync) 31.64 8575.45 35.76 36.11 2169.74 SQLite (sync) 16.28 20045.37 16.81 17.85 7524.08 memfile (sync) 23.97 1381215.47 23.66 23.71 345321.70

Performance measurements Graphical representation of the performance results presented in table .

Possible further optimizations For debugging purposes the code was compiled with -g -O0 flags. While majority of the time was spent in backend functions (that was probably compiled with -O2 flags), the benchmark code could perform faster, when compiled with -O2, rather than -O0. That is expected to affect memfile benchmark. Currently all operations were conducted on one by one basis. Each operation was treated as a separate transaction. Grouping X operations together will potentially bring almost X fold increase in synchronous operations. Extension for this benchmark in this regard should be considered. That affects only write operations (insert, update and delete). Read operations (search) are expected to be barely affected. Multi-threaded or multi-process benchmark may be considered in the future. It may be somewhat difficult as only some backends support concurrent access.

perfdhcp

Purpose There is a growing need to evaluate performance of DHCP servers in different traffic conditions to understand their bottle necks. This helps to elimante bugs in existing DHCP software as well as make informed decisions regarding new DHCP software designs to significantly improve its performance. The perfdhcp tool has been created to fill the gap in performance measurement capabilities mostly. However, the number of implemented features and parameters exposed to the user make this tool useful for functional testing as well.

Key features The perfdhcp exposes the number of command line parameters to control DHCP message exchanges. Currently they fall back to the following categories: Rate control - control how many DHCP exchanges are initiated within a period of time. Tool can also simulate best effort conditions where it attempts to start as many exchanges as possible. Test exit specifiers - control the conditions when test completes including number of initiated exchanges, test period or maximum number of dropped packets. Packet templates - specify files containing packet templates that are used by perfdhcp to create custom DHCP messages instead of default. Tool also allows to specify number of values indicating offsets of variable values within a packet that are modified in flight by the tool. Reporting - for each test produce the set of performance data including achieved packet exchange rate (server performance). There is also a number of diagnostic selectors available that enable periodic (intermediate) reporting, packet timestamps printing and detailed information about perfdhcp internal states (for debugging). Different mode of operations - specify DHCP version used (v4 or v6), 2-way or 4-way exchanges, use Rapid Commit option for DHCPv6. IP layer options - specify local/remote address, local interface and local port to be used for communication with DHCP server.

Command line options The following command line options may be used with perfdhcp tool. This summary also presents its possible exit codes as well as error counters printed along when test is complete: $ ./perfdhcp -h ] [-t] [-R] [-b] [-n] [-p] [-d] [-D] [-l] [-P] [-a] [-L] [-s] [-i] [-B] [-c] [-1] [-T] [-X] [-O] [-S] [-I] [-x] [-w] [server] The [server] argument is the name/address of the DHCP server to contact. For DHCPv4 operation, exchanges are initiated by transmitting a DHCP DISCOVER to this address. For DHCPv6 operation, exchanges are initiated by transmitting a DHCP SOLICIT to this address. In the DHCPv6 case, the special name 'all' can be used to refer to All_DHCP_Relay_Agents_and_Servers (the multicast address FF02::1:2), or the special name 'servers' to refer to All_DHCP_Servers (the multicast address FF05::1:3). The [server] argument is optional only in the case that -l is used to specify an interface, in which case [server] defaults to 'all'. The default is to perform a single 4-way exchange, effectively pinging the server. The -r option is used to set up a performance test, without it exchanges are initiated as fast as possible. Options: -1: Take the server-ID option from the first received message. -4: DHCPv4 operation (default). This is incompatible with the -6 option. -6: DHCPv6 operation. This is incompatible with the -4 option. -a: When the target sending rate is not yet reached, control how many exchanges are initiated before the next pause. -b: The base mac, duid, IP, etc, used to simulate different clients. This can be specified multiple times, each instance is in the = form, for instance: (and default) mac=00:0c:01:02:03:04. -d: Specify the time after which a request is treated as having been lost. The value is given in seconds and may contain a fractional component. The default is 1 second. -E: Offset of the (DHCPv4) secs field / (DHCPv6) elapsed-time option in the (second/request) template. The value 0 disables it. -h: Print this help. -i: Do only the initial part of an exchange: DO or SA, depending on whether -6 is given. -I: Offset of the (DHCPv4) IP address in the requested-IP option / (DHCPv6) IA_NA option in the (second/request) template. -l: For DHCPv4 operation, specify the local hostname/address to use when communicating with the server. By default, the interface address through which traffic would normally be routed to the server is used. For DHCPv6 operation, specify the name of the network interface via which exchanges are initiated. -L: Specify the local port to use (the value 0 means to use the default). -O: Offset of the last octet to randomize in the template. -P: Initiate first exchanges back to back at startup. -r: Initiate DORA/SARR (or if -i is given, DO/SA) exchanges per second. A periodic report is generated showing the number of exchanges which were not completed, as well as the average response latency. The program continues until interrupted, at which point a final report is generated. -R: Specify how many different clients are used. With 1 (the default), all requests seem to come from the same client. -s: Specify the seed for randomization, making it repeatable. -S: Offset of the server-ID option in the (second/request) template. -T: The name of a file containing the template to use as a stream of hexadecimal digits. -v: Report the version number of this program. -w: Command to call with start/stop at the beginning/end of the program. -x: Include extended diagnostics in the output. is a string of single-keywords specifying the operations for which verbose output is desired. The selector keyletters are: * 'a': print the decoded command line arguments * 'e': print the exit reason * 'i': print rate processing details * 'r': print randomization details * 's': print first server-id * 't': when finished, print timers of all successful exchanges * 'T': when finished, print templates -X: Transaction ID (aka. xid) offset in the template. DHCPv4 only options: -B: Force broadcast handling. DHCPv6 only options: -c: Add a rapid commit option (exchanges will be SA). The remaining options are used only in conjunction with -r: -D: Abort the test if more than requests have been dropped. Use -D0 to abort if even a single request has been dropped. If includes the suffix '%', it specifies a maximum percentage of requests that may be dropped before abort. In this case, testing of the threshold begins after 10 requests have been expected to be received. -n: Initiate transactions. No report is generated until all transactions have been initiated/waited-for, after which a report is generated and the program terminates. -p: Send requests for the given test period, which is specified in the same manner as -d. This can be used as an alternative to -n, or both options can be given, in which case the testing is completed when either limit is reached. -t: Delay in seconds between two periodic reports. Errors: - tooshort: received a too short message - orphans: received a message which doesn't match an exchange (duplicate, late or not related) - locallimit: reached to local system limits when sending a message. Exit status: The exit status is: 0 on complete success. 1 for a general error. 2 if an error is found in the command line arguments. 3 if there are no general failures in operation, but one or more exchanges are not successfully completed. ]]>

Running a test In order to run performance test at least two separate systems have to be installed: client and server. The first one has to have perfdhcp tool installed, the latter has to have DHCP server running (v4 or v6). If only single system is available the client and server can be run on virtual machines (running on the same phisical system) but in this case performance data may be heavily impacted by the performance of VMs. The DHCP operates on low port numbers (67 for DHCPv4 relays and 547 for DHCPv6) running perfdhcp with non-root priviliges will usually result in the error message similar to this: $./perfdhcp -4 -l eth3 -r 100 all Error running perfdhcp: Failed to bind socket 3 to 172.16.1.2/port=67 perfdhcp exposes '-L' command line option that imposes use of custom local port and the following command line will work: $./perfdhcp -4 -l eth3 -r 100 -L 10067 all but in the standard configuration no responses will be received from the ISC DHCP server because server responds to default relay port 67. Alternative way to overcome the issue with lack of privileges to use default DHCP port number is to run perfdhcp as root. In this section the perfdhcp command line options examples are presented as a quick start guide for new users. For the detailed list of command line options refer to .