|
@@ -71,7 +71,7 @@ the cache.
|
|
Of course, these experimental results are too simplified. First, in
|
|
Of course, these experimental results are too simplified. First, in
|
|
these experiments we assumed only one external query is needed on
|
|
these experiments we assumed only one external query is needed on
|
|
cache miss. In general it can be more; however, it may not actually
|
|
cache miss. In general it can be more; however, it may not actually
|
|
-too optimistic either: in my another research result:
|
|
|
|
|
|
+be too optimistic either: in my another research result:
|
|
http://bind10.isc.org/wiki/ResolverPerformanceResearch
|
|
http://bind10.isc.org/wiki/ResolverPerformanceResearch
|
|
In the more detailed analysis using real query sample and tracing what
|
|
In the more detailed analysis using real query sample and tracing what
|
|
an actual resolver would do, it looked we'd need about 1.44 to 1.63
|
|
an actual resolver would do, it looked we'd need about 1.44 to 1.63
|
|
@@ -107,3 +107,150 @@ and cache management itself is always inevitable bottleneck, so the
|
|
production version wouldn't be that fast, but it still suggests it may
|
|
production version wouldn't be that fast, but it still suggests it may
|
|
not be very difficult to reach over 100Kqps in production environment
|
|
not be very difficult to reach over 100Kqps in production environment
|
|
including recursive resolution overhead.
|
|
including recursive resolution overhead.
|
|
|
|
+
|
|
|
|
+Cache Types
|
|
|
|
+-----------
|
|
|
|
+
|
|
|
|
+1. Record cache
|
|
|
|
+
|
|
|
|
+Conceptually, any recursive resolver (with cache) implementation would
|
|
|
|
+have cache for RRs (or RRsets in the modern version of protocol) given
|
|
|
|
+in responses to its external queries. In BIND 9, it's called the
|
|
|
|
+"cached DB", using an in-memory rbt-like tree. unbound calls it
|
|
|
|
+"rrset cache", which is implemented as a hash table.
|
|
|
|
+
|
|
|
|
+2. Delegation cache
|
|
|
|
+
|
|
|
|
+Recursive server implementations would also have cache to determine
|
|
|
|
+the deepest zone cut for a given query name in the recursion process.
|
|
|
|
+Neither BIND 9 nor unbound has a separate cache for this purpose;
|
|
|
|
+basically they try to find an NR RRset from the "record cache" whose
|
|
|
|
+owner name best matches the given query name.
|
|
|
|
+
|
|
|
|
+3. Remote server cache
|
|
|
|
+
|
|
|
|
+In addition, a recursive server implementation may maintain a cache
|
|
|
|
+for information of remote authoritative servers. Both BIND 9 and
|
|
|
|
+unbound conceptually have this type of cache, although there are some
|
|
|
|
+non-negligible differences in details. BIND 9's implementation of
|
|
|
|
+this cache is called ADB. Its a hash table whose key is domain name,
|
|
|
|
+and each entry stores corresponding IPv6/v4 addresses; another data
|
|
|
|
+structure for each address stores averaged RTT for the address,
|
|
|
|
+lameness information, EDNS availability, etc. unbound's
|
|
|
|
+implementation is called "infrastructure cache". It's a hash table
|
|
|
|
+keyed with IP addresses whose entries store similar information as
|
|
|
|
+that in BIND 9's per address ADB entry. In unbound a remote server's
|
|
|
|
+address must be determined by looking up the record cache (rrset cache
|
|
|
|
+in unbound terminology); unlike BIND 9's ADB, there's no direct
|
|
|
|
+shortcut from a server's domain name to IP addresses.
|
|
|
|
+
|
|
|
|
+4. Full response cache
|
|
|
|
+
|
|
|
|
+unbound has an additional cache layer, called the "message cache".
|
|
|
|
+It's a hash table whose hash key is query parameter (essentially qname
|
|
|
|
+and type) and entry is a sequence to record (rrset) cache entries.
|
|
|
|
+This sequence constructs a complete response to the corresponding
|
|
|
|
+query, so it would help optimize building a response message skipping
|
|
|
|
+the record cache for each section (answer/authority/additional) of the
|
|
|
|
+response message. PowerDNS recursor has (seemingly) the same concept
|
|
|
|
+called "packet cache" (but I don't know its implementation details
|
|
|
|
+very much).
|
|
|
|
+
|
|
|
|
+BIND 9 doesn't have this type of cache; it always looks into the
|
|
|
|
+record cache to build a complete response to a given query.
|
|
|
|
+
|
|
|
|
+Miscellaneous General Requirements
|
|
|
|
+----------------------------------
|
|
|
|
+
|
|
|
|
+- Minimize contention between threads (if threaded)
|
|
|
|
+- Cache purge policy: normally only a very small part of cached DNS
|
|
|
|
+ information will be reused, and those reused are very heavily
|
|
|
|
+ reused. So LRU-like algorithm should generally work well, but we'll
|
|
|
|
+ also need to honor DNS TTL.
|
|
|
|
+
|
|
|
|
+Random Ideas for BIND 10
|
|
|
|
+------------------------
|
|
|
|
+
|
|
|
|
+Below are specific random ideas for BIND 10. Some are based on
|
|
|
|
+experimental results with reasonably realistic data; some others are
|
|
|
|
+mostly a guess.
|
|
|
|
+
|
|
|
|
+1. Fully rendered response cache
|
|
|
|
+
|
|
|
|
+Some real world query samples show that a very small portion of entire
|
|
|
|
+queries are very popular and queried very often and many times; the
|
|
|
|
+rest is rarely reused, if any. Two different data sets show top
|
|
|
|
+10,000 queries would cover around 80% of total queries, regardless
|
|
|
|
+of the size of the total queries. This suggests an idea of having a
|
|
|
|
+small, highly optimized full response cache.
|
|
|
|
+
|
|
|
|
+I tried this idea in the jinmei-l1cache branch. It's a hash table
|
|
|
|
+keyed with a tuple of query name and type whose entry stores fully
|
|
|
|
+rendered, wire-format response image (answer section only, assuming
|
|
|
|
+the "minimal-responses" option). It also maintains offsets to each
|
|
|
|
+RR, so it can easily update TTLs when necessary or rotate RRs if
|
|
|
|
+optionally requested. If neither TTL adjustment nor RR rotation is
|
|
|
|
+required, query handling is just to lookup the hash table and copy the
|
|
|
|
+pre-rendered data. Experimental benchmark showed it ran vary fast;
|
|
|
|
+more than 4 times faster than BIND 9, and even much faster than other
|
|
|
|
+implementations that have full response cache (although, as usual, the
|
|
|
|
+comparison is not entirely fair).
|
|
|
|
+
|
|
|
|
+Also, the cache size is quite small; the run time memory footprint of
|
|
|
|
+this server process was just about 5MB. So, I think it's reasonable
|
|
|
|
+to have each process/thread have their own copy of this cache to
|
|
|
|
+completely eliminate contention. Also, if we can keep the cache size
|
|
|
|
+this small, it would be easier to dump it to a file on shutdown and
|
|
|
|
+reuse it on restart. This will be quite effective (if the downtime is
|
|
|
|
+reasonably short) because the cached data are expected to be highly
|
|
|
|
+popular.
|
|
|
|
+
|
|
|
|
+2. Record cache
|
|
|
|
+
|
|
|
|
+For the normal record cache, I don't have a particular idea beyond
|
|
|
|
+something obvious, like a hash table to map from query parameters to
|
|
|
|
+corresponding RRset (or negative information). But I guess this cache
|
|
|
|
+should be shared by multiple threads. That will help reconstruct the
|
|
|
|
+full response cache data on TTL expiration more efficiently. And, if
|
|
|
|
+shared, the data structure should be chosen so that contention
|
|
|
|
+overhead can be minimized. In general, I guess something like hash
|
|
|
|
+tables is more suitable than tree-like structure in that sense.
|
|
|
|
+
|
|
|
|
+There's other points to discuss for this cache related to other types
|
|
|
|
+of cache (see below).
|
|
|
|
+
|
|
|
|
+3. Separate delegation cache
|
|
|
|
+
|
|
|
|
+One thing I'm guessing is that it may make sense if we have a separate
|
|
|
|
+cache structure for delegation data. It's conceptually a set of NS
|
|
|
|
+RRs so we can identify the best (longest) matching one for a given
|
|
|
|
+query name.
|
|
|
|
+
|
|
|
|
+Analysis of some sets of query data showed the vast majority of
|
|
|
|
+end client's queries are for A and AAAA (not surprisingly). So, even
|
|
|
|
+if we separate this cache from the record cache, the additional
|
|
|
|
+overhead (both for memory and fetch) will probably (hopefully) be
|
|
|
|
+marginal. Separating caches will also help reduce contention between
|
|
|
|
+threads. It *might* also help improve lookup performance because this
|
|
|
|
+can be optimized for longest match search.
|
|
|
|
+
|
|
|
|
+4. Remote server cache without involving the record cache
|
|
|
|
+
|
|
|
|
+Likewise, it may make sense to maintain the remote server cache
|
|
|
|
+separately from the record cache. I guess these AAAA and A records
|
|
|
|
+are rarely the queried by end clients, so, like the case of delegation
|
|
|
|
+cache it's possible that the data sets are mostly disjoint. Also, for
|
|
|
|
+this purpose the RRsets don't have to have higher trust rank (per
|
|
|
|
+RFC2181 5.4.1): glue or additional are okay, and, by separating these
|
|
|
|
+from the record cache, we can avoid accidental promotion of these data
|
|
|
|
+to trustworthy answers and returning them to clients (BIND 9 had this
|
|
|
|
+type of bugs before).
|
|
|
|
+
|
|
|
|
+Custom vs Existing Library (STL etc)
|
|
|
|
+------------------------------------
|
|
|
|
+
|
|
|
|
+It may have to be discussed, but I guess in many cases we end up
|
|
|
|
+introducing custom implementation because these caches should be
|
|
|
|
+highly performance sensitive, directly related our core business, and
|
|
|
|
+also have to be memory efficient. But in some sub components we may
|
|
|
|
+be able to benefit from existing generic libraries.
|