12 years ago · 0fdc68bb68
--- a/doc/design/resolver/01-scaling-across-cores
+++ b/doc/design/resolver/01-scaling-across-cores
@@ -0,0 +1,21 @@
 
																+01-scaling-across-cores
															
 
																+
															
 
																+Introduction
															
 
																+------------
															
 
																+The general issue is how to insure that the resolver scales.
															
 
																+
															
 
																+Currently resolvers are CPU bound, and it seems likely that both
															
 
																+instructions-per-cycle and CPU frequency will not increase radically,
															
 
																+scaling will need to be across multiple cores.
															
 
																+
															
 
																+How can we best scale a recursive resolver across multiple cores?
															
 
																+
															
 
																+Some possible solutions:
															
 
																+
															
 
																+a. Multiple processes with independent caches
															
 
																+b. Multiple processes with shared cache
															
 
																+c. A mix of independent/shared cache
															
 
																+d. Thread variations of the above
															
 
																+
															
 
																+All of these may be complicated by NUMA architectures (with
															
 
																+faster/slower access to specific RAM).
															
--- a/doc/design/resolver/02-mixed-recursive-authority-setup
+++ b/doc/design/resolver/02-mixed-recursive-authority-setup
@@ -0,0 +1,150 @@
 
																+Mixed recursive & authoritative setup
															
 
																+=====================================
															
 
																+
															
 
																+Ideally we will run the authoritative server independently of the
															
 
																+recursive resolver.
															
 
																+
															
 
																+We need a way to run both an authoritative and a recursive resolver on
															
 
																+the same machine and listening on the same IP/port. But we need a way to
															
 
																+run only one of them as well.
															
 
																+
															
 
																+This is mostly the same problem as we have with DDNS packets and xfr-out
															
 
																+requests, but they aren't that performance sensitive as auth & resolver.
															
 
																+
															
 
																+There are a number of possible approaches to this:
															
 
																+
															
 
																+One fat module
															
 
																+--------------
															
 
																+
															
 
																+With some build system or dynamic linker tricks, we create three modules:
															
 
																+
															
 
																+ * Stand-alone auth
															
 
																+ * Stand-alone resolver
															
 
																+ * Compound module containing both
															
 
																+
															
 
																+The user then chooses either one stand-alone module, or the compound one,
															
 
																+depending on the requirements.
															
 
																+
															
 
																+Advantages
															
 
																+~~~~~~~~~~
															
 
																+
															
 
																+ * It is easier to switch between processing and ask authoritative questions
															
 
																+   from within the resolver processing.
															
 
																+
															
 
																+Disadvantages
															
 
																+~~~~~~~~~~~~~
															
 
																+
															
 
																+ * The code is not separated (one bugs takes down both, admin can't see which
															
 
																+   one takes how much CPU).
															
 
																+ * BIND 9 does this and its code is a jungle. Maybe it's not just a
															
 
																+   coincidence.
															
 
																+ * Limits flexibility -- for example, we can't then decide to make the resolver
															
 
																+   threaded (or we would have to make sure the auth processing doesn't break
															
 
																+   with threads, which will be hard).
															
 
																+
															
 
																+There's also the idea of putting the auth into a loadable library and the
															
 
																+resolver could load and use it somehow. But the advantages and disadvantages
															
 
																+are probably the same.
															
 
																+
															
 
																+Auth first
															
 
																+----------
															
 
																+
															
 
																+We do the same as with xfrout and ddns. When a query comes, it is examined and
															
 
																+if the `RD` bit is set, it is forwarded to the resolver.
															
 
																+
															
 
																+Advantages
															
 
																+~~~~~~~~~~
															
 
																+
															
 
																+ * Separate auth and resolver modules
															
 
																+ * Minimal changes to auth
															
 
																+ * No slowdown on the auth side
															
 
																+
															
 
																+Disadvantages
															
 
																+~~~~~~~~~~~~~
															
 
																+
															
 
																+ * Counter-intuitive asymmetric design
															
 
																+ * Possible slowdown on the resolver side
															
 
																+ * Resolver needs to know both modes (for running stand-alone too)
															
 
																+
															
 
																+There's also the possibility of the reverse -- resolver first. It may make
															
 
																+more sense for performance (the more usual scenario would probably be a
															
 
																+high-load resolver with just few low-volume authoritative zones). On the other
															
 
																+hand, auth already has some forwarding tricks.
															
 
																+
															
 
																+Auth with cache
															
 
																+---------------
															
 
																+
															
 
																+This is mostly the same as ``Auth first'', however, the cache is in the auth
															
 
																+server. If it is in the cache, it is answered right away. If not, it is then
															
 
																+forwarded to the resolver. The resolver then updates the cache too.
															
 
																+
															
 
																+Advantages
															
 
																+~~~~~~~~~~
															
 
																+
															
 
																+ * Probably good performance
															
 
																+
															
 
																+Disadvantages
															
 
																+~~~~~~~~~~~~~
															
 
																+
															
 
																+ * Cache duplication (several auth modules, it doesn't feel like it would work
															
 
																+   with shared memory without locking).
															
 
																+ * Cache is probably very different from authoritative zones, it would
															
 
																+   complicate auth processing.
															
 
																+ * The resolver needs own copy of cache (to be able to get partial results),
															
 
																+   probably a different one than the auth server.
															
 
																+
															
 
																+Receptionist
															
 
																+------------
															
 
																+
															
 
																+One module does only the listening. It doesn't process the queries itself, it
															
 
																+only looks into them and forwards them to the processing modules.
															
 
																+
															
 
																+Advantages
															
 
																+~~~~~~~~~~
															
 
																+
															
 
																+ * Clean design with separated modules
															
 
																+ * Easy to run modules stand-alone
															
 
																+ * Allows for solving the xfrout & ddns forwarding without auth running
															
 
																+ * Allows for views (different auths with different configurations)
															
 
																+ * Allows balancing/clustering across multiple machines
															
 
																+ * Easy to create new modules for different kinds of DNS handling and share
															
 
																+   port with them too
															
 
																+
															
 
																+Disadvantages
															
 
																+~~~~~~~~~~~~~
															
 
																+
															
 
																+ * Need to set up another module (not a problem if we have inter-module
															
 
																+   dependencies in b10-init)
															
 
																+ * Possible performance impact. However, experiments show this is not an issue,
															
 
																+   and the receptionist can actually increase the throughput with some tuning
															
 
																+   and the increase in RTT is not big.
															
 
																+
															
 
																+Implementation ideas
															
 
																+~~~~~~~~~~~~~~~~~~~~
															
 
																+
															
 
																+ * Let's have a new TCP transport, where we send not only the DNS messages,
															
 
																+   but also the source and destination ports and addresses (two reasons --
															
 
																+   ACLs in target module and not keeping state in the receptionist). It would
															
 
																+   allow for transfer of a batch of messages at once, to save some calls to
															
 
																+   kernel (like a length of block of messages, it is read at once, then they
															
 
																+   are all parsed one by one, the whole block of answers is sent back).
															
 
																+ * A module creates a listening socket (UNIX by default) on startup and
															
 
																+   contacts all the receptionists. It sends what kind of packets to send
															
 
																+   to the module and the address of the UNIX socket. All the receptionists
															
 
																+   connect to the module. This allows for auto-configuring the receptionist.
															
 
																+ * The queries are sent from the receptionist in batches, the answers are sent
															
 
																+   back to the receptionist in batches too.
															
 
																+ * It is possible to fine-tune and use OS-specific tricks (like epoll or
															
 
																+   sending multiple UDP messages by single call to sendmmsg()).
															
 
																+
															
 
																+Proposal
															
 
																+--------
															
 
																+
															
 
																+Implement the receptionist in a way we can still work without it (not throwing
															
 
																+the current UDPServer and TCPServer in asiodns away).
															
 
																+
															
 
																+The way we handle xfrout and DDNS needs some changes, since we can't forward
															
 
																+sockets for the query. We would implement the receptionist protocol on them,
															
 
																+which would allow the receptionist to forward messages to them. We would then
															
 
																+modify auth to be able to forward the queries over the receptionist protocol,
															
 
																+so ordinary users don't need to start the receptionist.
															
--- a/doc/design/resolver/03-cache-algorithm
+++ b/doc/design/resolver/03-cache-algorithm
@@ -0,0 +1,22 @@
 
																+03-cache-algorithm
															
 
																+
															
 
																+Introduction
															
 
																+------------
															
 
																+Cache performance may be important for the resolver. It might not be
															
 
																+critical. We need to research this.
															
 
																+
															
 
																+One key question is: given a specific cache hit rate, how much of an
															
 
																+impact does cache performance have?
															
 
																+
															
 
																+For example, if we have 90% cache hit rate, will we still be spending
															
 
																+most of our time in system calls or in looking things up in our cache?
															
 
																+
															
 
																+There are several ways we can consider figuring this out, including
															
 
																+measuring this in existing resolvers (BIND 9, Unbound) or modeling
															
 
																+with specific values.
															
 
																+
															
 
																+Once we know how critical the cache performance is, we can consider
															
 
																+which algorithm is best for that. If it is very critical, then a
															
 
																+custom algorithm designed for DNS caching makes sense. If it is not,
															
 
																+then we can consider using an STL-based data structure.
															
 
																+
															
--- a/doc/design/resolver/README
+++ b/doc/design/resolver/README
@@ -0,0 +1,5 @@
 
																+This directory contains research and design documents for the BIND 10
															
 
																+resolver reimplementation.
															
 
																+
															
 
																+Each file contains a specific issue and discussion surrounding that
															
 
																+issue.