Browse Source

Merge #2776

Resolver research document about mixed resolver/authoritative mode
Michal 'vorner' Vaner 12 years ago
parent
commit
0fdc68bb68

+ 21 - 0
doc/design/resolver/01-scaling-across-cores

@@ -0,0 +1,21 @@
+01-scaling-across-cores
+
+Introduction
+------------
+The general issue is how to insure that the resolver scales.
+
+Currently resolvers are CPU bound, and it seems likely that both
+instructions-per-cycle and CPU frequency will not increase radically,
+scaling will need to be across multiple cores.
+
+How can we best scale a recursive resolver across multiple cores?
+
+Some possible solutions:
+
+a. Multiple processes with independent caches
+b. Multiple processes with shared cache
+c. A mix of independent/shared cache
+d. Thread variations of the above
+
+All of these may be complicated by NUMA architectures (with
+faster/slower access to specific RAM).

+ 150 - 0
doc/design/resolver/02-mixed-recursive-authority-setup

@@ -0,0 +1,150 @@
+Mixed recursive & authoritative setup
+=====================================
+
+Ideally we will run the authoritative server independently of the
+recursive resolver.
+
+We need a way to run both an authoritative and a recursive resolver on
+the same machine and listening on the same IP/port. But we need a way to
+run only one of them as well.
+
+This is mostly the same problem as we have with DDNS packets and xfr-out
+requests, but they aren't that performance sensitive as auth & resolver.
+
+There are a number of possible approaches to this:
+
+One fat module
+--------------
+
+With some build system or dynamic linker tricks, we create three modules:
+
+ * Stand-alone auth
+ * Stand-alone resolver
+ * Compound module containing both
+
+The user then chooses either one stand-alone module, or the compound one,
+depending on the requirements.
+
+Advantages
+~~~~~~~~~~
+
+ * It is easier to switch between processing and ask authoritative questions
+   from within the resolver processing.
+
+Disadvantages
+~~~~~~~~~~~~~
+
+ * The code is not separated (one bugs takes down both, admin can't see which
+   one takes how much CPU).
+ * BIND 9 does this and its code is a jungle. Maybe it's not just a
+   coincidence.
+ * Limits flexibility -- for example, we can't then decide to make the resolver
+   threaded (or we would have to make sure the auth processing doesn't break
+   with threads, which will be hard).
+
+There's also the idea of putting the auth into a loadable library and the
+resolver could load and use it somehow. But the advantages and disadvantages
+are probably the same.
+
+Auth first
+----------
+
+We do the same as with xfrout and ddns. When a query comes, it is examined and
+if the `RD` bit is set, it is forwarded to the resolver.
+
+Advantages
+~~~~~~~~~~
+
+ * Separate auth and resolver modules
+ * Minimal changes to auth
+ * No slowdown on the auth side
+
+Disadvantages
+~~~~~~~~~~~~~
+
+ * Counter-intuitive asymmetric design
+ * Possible slowdown on the resolver side
+ * Resolver needs to know both modes (for running stand-alone too)
+
+There's also the possibility of the reverse -- resolver first. It may make
+more sense for performance (the more usual scenario would probably be a
+high-load resolver with just few low-volume authoritative zones). On the other
+hand, auth already has some forwarding tricks.
+
+Auth with cache
+---------------
+
+This is mostly the same as ``Auth first'', however, the cache is in the auth
+server. If it is in the cache, it is answered right away. If not, it is then
+forwarded to the resolver. The resolver then updates the cache too.
+
+Advantages
+~~~~~~~~~~
+
+ * Probably good performance
+
+Disadvantages
+~~~~~~~~~~~~~
+
+ * Cache duplication (several auth modules, it doesn't feel like it would work
+   with shared memory without locking).
+ * Cache is probably very different from authoritative zones, it would
+   complicate auth processing.
+ * The resolver needs own copy of cache (to be able to get partial results),
+   probably a different one than the auth server.
+
+Receptionist
+------------
+
+One module does only the listening. It doesn't process the queries itself, it
+only looks into them and forwards them to the processing modules.
+
+Advantages
+~~~~~~~~~~
+
+ * Clean design with separated modules
+ * Easy to run modules stand-alone
+ * Allows for solving the xfrout & ddns forwarding without auth running
+ * Allows for views (different auths with different configurations)
+ * Allows balancing/clustering across multiple machines
+ * Easy to create new modules for different kinds of DNS handling and share
+   port with them too
+
+Disadvantages
+~~~~~~~~~~~~~
+
+ * Need to set up another module (not a problem if we have inter-module
+   dependencies in b10-init)
+ * Possible performance impact. However, experiments show this is not an issue,
+   and the receptionist can actually increase the throughput with some tuning
+   and the increase in RTT is not big.
+
+Implementation ideas
+~~~~~~~~~~~~~~~~~~~~
+
+ * Let's have a new TCP transport, where we send not only the DNS messages,
+   but also the source and destination ports and addresses (two reasons --
+   ACLs in target module and not keeping state in the receptionist). It would
+   allow for transfer of a batch of messages at once, to save some calls to
+   kernel (like a length of block of messages, it is read at once, then they
+   are all parsed one by one, the whole block of answers is sent back).
+ * A module creates a listening socket (UNIX by default) on startup and
+   contacts all the receptionists. It sends what kind of packets to send
+   to the module and the address of the UNIX socket. All the receptionists
+   connect to the module. This allows for auto-configuring the receptionist.
+ * The queries are sent from the receptionist in batches, the answers are sent
+   back to the receptionist in batches too.
+ * It is possible to fine-tune and use OS-specific tricks (like epoll or
+   sending multiple UDP messages by single call to sendmmsg()).
+
+Proposal
+--------
+
+Implement the receptionist in a way we can still work without it (not throwing
+the current UDPServer and TCPServer in asiodns away).
+
+The way we handle xfrout and DDNS needs some changes, since we can't forward
+sockets for the query. We would implement the receptionist protocol on them,
+which would allow the receptionist to forward messages to them. We would then
+modify auth to be able to forward the queries over the receptionist protocol,
+so ordinary users don't need to start the receptionist.

+ 22 - 0
doc/design/resolver/03-cache-algorithm

@@ -0,0 +1,22 @@
+03-cache-algorithm
+
+Introduction
+------------
+Cache performance may be important for the resolver. It might not be
+critical. We need to research this.
+
+One key question is: given a specific cache hit rate, how much of an
+impact does cache performance have?
+
+For example, if we have 90% cache hit rate, will we still be spending
+most of our time in system calls or in looking things up in our cache?
+
+There are several ways we can consider figuring this out, including
+measuring this in existing resolvers (BIND 9, Unbound) or modeling
+with specific values.
+
+Once we know how critical the cache performance is, we can consider
+which algorithm is best for that. If it is very critical, then a
+custom algorithm designed for DNS caching makes sense. If it is not,
+then we can consider using an STL-based data structure.
+

+ 5 - 0
doc/design/resolver/README

@@ -0,0 +1,5 @@
+This directory contains research and design documents for the BIND 10
+resolver reimplementation.
+
+Each file contains a specific issue and discussion surrounding that
+issue.