11 years ago · 3677658ea0
--- a/doc/design/resolver/03-cache-algorithm
+++ b/doc/design/resolver/03-cache-algorithm
@@ -20,3 +20,237 @@ which algorithm is best for that. If it is very critical, then a
 
				 custom algorithm designed for DNS caching makes sense. If it is not,
			
 
				 then we can consider using an STL-based data structure.
			
 
				 
			
 
				+Effectiveness of Cache
			
 
				+----------------------
			
 
				+
			
 
				+First, I'll try to answer the introductory questions.
			
 
				+
			
 
				+In some simplified model, we can express the amount of running time
			
 
				+for answering queries directly from the cache in the total running
			
 
				+time including that used for recursive resolution due to cache miss as
			
 
				+follows:
			
 
				+
			
 
				+A = r*Q2*/(r*Q2+ Q1*(1-r))
			
 
				+where
			
 
				+A: amount of time for answering queries from the cache per unit time
			
 
				+   (such as sec, 0<=A<=1)
			
 
				+r: cache hit rate (0<=r<=1)
			
 
				+Q1: max qps of the server with 100% cache hit
			
 
				+Q2: max qps of the server with 0% cache hit
			
 
				+
			
 
				+Q1 can be measured easily for given data set; measuring Q2 is tricky
			
 
				+in general (it requires many external queries with unreliable
			
 
				+results), but we can still have some not-so-unrealistic numbers
			
 
				+through controlled simulation.
			
 
				+
			
 
				+As a data point for these values, see a previous experimental results
			
 
				+of mine:
			
 
				+https://lists.isc.org/pipermail/bind10-dev/2012-July/003628.html
			
 
				+
			
 
				+Looking at the "ideal" server implementation (no protocol overhead)
			
 
				+with the set up 90% and 85% cache hit rate with 1 recursion on cache
			
 
				+miss, and with the possible maximum total throughput, we can deduce
			
 
				+Q1 and Q2, which are: 170591qps and 60138qps respectively.
			
 
				+
			
 
				+This means, with 90% cache hit rate (r = 0.9), the server would spend
			
 
				+76% of its run time for receiving queries and answering responses
			
 
				+directly from the cache: 0.9*60138/(0.9*60138 + 0.1*170591) = 0.76.
			
 
				+
			
 
				+I also ran more realistic experiments: using BIND 9.9.2 and unbound
			
 
				+1.4.19 in the "forward only" mode with crafted query data and the
			
 
				+forwarded server to emulate the situation of 100% and 0% cache hit
			
 
				+rates.  I then measured the max response throughput using a
			
 
				+queryperf-like tool.  In both cases Q2 is about 28% of Q1 (I'm not
			
 
				+showing specific numbers to avoid unnecessary discussion about
			
 
				+specific performance of existing servers; it's out of scope of this
			
 
				+memo).  Using Q2 = 0.28*Q1, above equation with 90% cache hit rate
			
 
				+will be: A = 0.9 * 0.28 / (0.9*0.28 + 0.1) = 0.716. So the server will
			
 
				+spend about 72% of its running time to answer queries directly from
			
 
				+the cache.
			
 
				+
			
 
				+Of course, these experimental results are too simplified.  First, in
			
 
				+these experiments we assumed only one external query is needed on
			
 
				+cache miss.  In general it can be more; however, it may not actually
			
 
				+be too optimistic either: in my another research result:
			
 
				+http://bind10.isc.org/wiki/ResolverPerformanceResearch
			
 
				+In the more detailed analysis using real query sample and tracing what
			
 
				+an actual resolver would do, it looked we'd need about 1.44 to 1.63
			
 
				+external queries per cache miss in average.
			
 
				+
			
 
				+Still, of course, the real world cases are not that simple: in reality
			
 
				+we'd need to deal with timeouts, slower remote servers, unexpected
			
 
				+intermediate results, etc.  DNSSEC validating resolvers will clearly
			
 
				+need to do more work.
			
 
				+
			
 
				+So, in the real world deployment Q2 should be much smaller than Q1.
			
 
				+Here are some specific cases of the relationship between Q1 and Q2 for
			
 
				+given A (assuming r = 0.9):
			
 
				+
			
 
				+70%: Q2 = 0.26 * Q1
			
 
				+60%: Q2 = 0.17 * Q1
			
 
				+50%: Q2 = 0.11 * Q1
			
 
				+
			
 
				+So, even if "recursive resolution is 10 times heavier" than the cache
			
 
				+only case, we can assume the server spends a half of its run time for
			
 
				+answering queries directly from the cache at the cache hit rate of
			
 
				+90%.  I think this is a reasonably safe assumption.
			
 
				+
			
 
				+Now, assuming the number of 50% or more, does this suggest we should
			
 
				+highly optimize the cache?  Opinions may vary on this point, but I
			
 
				+personally think the answer is yes.  I've written an experimental
			
 
				+cache only implementation that employs the idea of fully-rendered
			
 
				+cached data.  On one test machine (2.20GHz AMD64, using a single
			
 
				+core), queryperf-like benchmark shows it can handle over 180Kqps,
			
 
				+while BIND 9.9.2 can just handle 41K qps.  The experimental
			
 
				+implementation skips some necessary features for a production server,
			
 
				+and cache management itself is always inevitable bottleneck, so the
			
 
				+production version wouldn't be that fast, but it still suggests it may
			
 
				+not be very difficult to reach over 100Kqps in production environment
			
 
				+including recursive resolution overhead.
			
 
				+
			
 
				+Cache Types
			
 
				+-----------
			
 
				+
			
 
				+1. Record cache
			
 
				+
			
 
				+Conceptually, any recursive resolver (with cache) implementation would
			
 
				+have cache for RRs (or RRsets in the modern version of protocol) given
			
 
				+in responses to its external queries.  In BIND 9, it's called the
			
 
				+"cached DB", using an in-memory rbt-like tree.  unbound calls it
			
 
				+"rrset cache", which is implemented as a hash table.
			
 
				+
			
 
				+2. Delegation cache
			
 
				+
			
 
				+Recursive server implementations would also have cache to determine
			
 
				+the deepest zone cut for a given query name in the recursion process.
			
 
				+Neither BIND 9 nor unbound has a separate cache for this purpose;
			
 
				+basically they try to find an NR RRset from the "record cache" whose
			
 
				+owner name best matches the given query name.
			
 
				+
			
 
				+3. Remote server cache
			
 
				+
			
 
				+In addition, a recursive server implementation may maintain a cache
			
 
				+for information of remote authoritative servers.  Both BIND 9 and
			
 
				+unbound conceptually have this type of cache, although there are some
			
 
				+non-negligible differences in details.  BIND 9's implementation of
			
 
				+this cache is called ADB.  Its a hash table whose key is domain name,
			
 
				+and each entry stores corresponding IPv6/v4 addresses; another data
			
 
				+structure for each address stores averaged RTT for the address,
			
 
				+lameness information, EDNS availability, etc.  unbound's
			
 
				+implementation is called "infrastructure cache".  It's a hash table
			
 
				+keyed with IP addresses whose entries store similar information as
			
 
				+that in BIND 9's per address ADB entry.  In unbound a remote server's
			
 
				+address must be determined by looking up the record cache (rrset cache
			
 
				+in unbound terminology); unlike BIND 9's ADB, there's no direct
			
 
				+shortcut from a server's domain name to IP addresses.
			
 
				+
			
 
				+4. Full response cache
			
 
				+
			
 
				+unbound has an additional cache layer, called the "message cache".
			
 
				+It's a hash table whose hash key is query parameter (essentially qname
			
 
				+and type) and entry is a sequence to record (rrset) cache entries.
			
 
				+This sequence constructs a complete response to the corresponding
			
 
				+query, so it would help optimize building a response message skipping
			
 
				+the record cache for each section (answer/authority/additional) of the
			
 
				+response message.  PowerDNS recursor has (seemingly) the same concept
			
 
				+called "packet cache" (but I don't know its implementation details
			
 
				+very much).
			
 
				+
			
 
				+BIND 9 doesn't have this type of cache; it always looks into the
			
 
				+record cache to build a complete response to a given query.
			
 
				+
			
 
				+Miscellaneous General Requirements
			
 
				+----------------------------------
			
 
				+
			
 
				+- Minimize contention between threads (if threaded)
			
 
				+- Cache purge policy: normally only a very small part of cached DNS
			
 
				+  information will be reused, and those reused are very heavily
			
 
				+  reused.  So LRU-like algorithm should generally work well, but we'll
			
 
				+  also need to honor DNS TTL.
			
 
				+
			
 
				+Random Ideas for BIND 10
			
 
				+------------------------
			
 
				+
			
 
				+Below are specific random ideas for BIND 10.  Some are based on
			
 
				+experimental results with reasonably realistic data; some others are
			
 
				+mostly a guess.
			
 
				+
			
 
				+1. Fully rendered response cache
			
 
				+
			
 
				+Some real world query samples show that a very small portion of entire
			
 
				+queries are very popular and queried very often and many times; the
			
 
				+rest is rarely reused, if any.  Two different data sets show top
			
 
				+10,000 queries would cover around 80% of total queries, regardless
			
 
				+of the size of the total queries.  This suggests an idea of having a
			
 
				+small, highly optimized full response cache.
			
 
				+
			
 
				+I tried this idea in the jinmei-l1cache branch.  It's a hash table
			
 
				+keyed with a tuple of query name and type whose entry stores fully
			
 
				+rendered, wire-format response image (answer section only, assuming
			
 
				+the "minimal-responses" option).  It also maintains offsets to each
			
 
				+RR, so it can easily update TTLs when necessary or rotate RRs if
			
 
				+optionally requested.  If neither TTL adjustment nor RR rotation is
			
 
				+required, query handling is just to lookup the hash table and copy the
			
 
				+pre-rendered data.  Experimental benchmark showed it ran vary fast;
			
 
				+more than 4 times faster than BIND 9, and even much faster than other
			
 
				+implementations that have full response cache (although, as usual, the
			
 
				+comparison is not entirely fair).
			
 
				+
			
 
				+Also, the cache size is quite small; the run time memory footprint of
			
 
				+this server process was just about 5MB.  So, I think it's reasonable
			
 
				+to have each process/thread have their own copy of this cache to
			
 
				+completely eliminate contention.  Also, if we can keep the cache size
			
 
				+this small, it would be easier to dump it to a file on shutdown and
			
 
				+reuse it on restart.  This will be quite effective (if the downtime is
			
 
				+reasonably short) because the cached data are expected to be highly
			
 
				+popular.
			
 
				+
			
 
				+2. Record cache
			
 
				+
			
 
				+For the normal record cache, I don't have a particular idea beyond
			
 
				+something obvious, like a hash table to map from query parameters to
			
 
				+corresponding RRset (or negative information).  But I guess this cache
			
 
				+should be shared by multiple threads.  That will help reconstruct the
			
 
				+full response cache data on TTL expiration more efficiently.  And, if
			
 
				+shared, the data structure should be chosen so that contention
			
 
				+overhead can be minimized.  In general, I guess something like hash
			
 
				+tables is more suitable than tree-like structure in that sense.
			
 
				+
			
 
				+There's other points to discuss for this cache related to other types
			
 
				+of cache (see below).
			
 
				+
			
 
				+3. Separate delegation cache
			
 
				+
			
 
				+One thing I'm guessing is that it may make sense if we have a separate
			
 
				+cache structure for delegation data.  It's conceptually a set of NS
			
 
				+RRs so we can identify the best (longest) matching one for a given
			
 
				+query name.
			
 
				+
			
 
				+Analysis of some sets of query data showed the vast majority of
			
 
				+end client's queries are for A and AAAA (not surprisingly).  So, even
			
 
				+if we separate this cache from the record cache, the additional
			
 
				+overhead (both for memory and fetch) will probably (hopefully) be
			
 
				+marginal.  Separating caches will also help reduce contention between
			
 
				+threads.  It *might* also help improve lookup performance because this
			
 
				+can be optimized for longest match search.
			
 
				+
			
 
				+4. Remote server cache without involving the record cache
			
 
				+
			
 
				+Likewise, it may make sense to maintain the remote server cache
			
 
				+separately from the record cache.  I guess these AAAA and A records
			
 
				+are rarely the queried by end clients, so, like the case of delegation
			
 
				+cache it's possible that the data sets are mostly disjoint.  Also, for
			
 
				+this purpose the RRsets don't have to have higher trust rank (per
			
 
				+RFC2181 5.4.1): glue or additional are okay, and, by separating these
			
 
				+from the record cache, we can avoid accidental promotion of these data
			
 
				+to trustworthy answers and returning them to clients (BIND 9 had this
			
 
				+type of bugs before).
			
 
				+
			
 
				+Custom vs Existing Library (STL etc)
			
 
				+------------------------------------
			
 
				+
			
 
				+It may have to be discussed, but I guess in many cases we end up
			
 
				+introducing custom implementation because these caches should be
			
 
				+highly performance sensitive, directly related our core business, and
			
 
				+also have to be memory efficient.  But in some sub components we may
			
 
				+be able to benefit from existing generic libraries.
			
--- a/doc/design/resolver/03-cache-algorithm.txt
+++ b/doc/design/resolver/03-cache-algorithm.txt
@@ -1,256 +0,0 @@
 
				-03-cache-algorithm
			
 
				-
			
 
				-Introduction
			
 
				-------------
			
 
				-Cache performance may be important for the resolver. It might not be
			
 
				-critical. We need to research this.
			
 
				-
			
 
				-One key question is: given a specific cache hit rate, how much of an
			
 
				-impact does cache performance have?
			
 
				-
			
 
				-For example, if we have 90% cache hit rate, will we still be spending
			
 
				-most of our time in system calls or in looking things up in our cache?
			
 
				-
			
 
				-There are several ways we can consider figuring this out, including
			
 
				-measuring this in existing resolvers (BIND 9, Unbound) or modeling
			
 
				-with specific values.
			
 
				-
			
 
				-Once we know how critical the cache performance is, we can consider
			
 
				-which algorithm is best for that. If it is very critical, then a
			
 
				-custom algorithm designed for DNS caching makes sense. If it is not,
			
 
				-then we can consider using an STL-based data structure.
			
 
				-
			
 
				-Effectiveness of Cache
			
 
				-----------------------
			
 
				-
			
 
				-First, I'll try to answer the introductory questions.
			
 
				-
			
 
				-In some simplified model, we can express the amount of running time
			
 
				-for answering queries directly from the cache in the total running
			
 
				-time including that used for recursive resolution due to cache miss as
			
 
				-follows:
			
 
				-
			
 
				-A = r*Q2*/(r*Q2+ Q1*(1-r))
			
 
				-where
			
 
				-A: amount of time for answering queries from the cache per unit time
			
 
				-   (such as sec, 0<=A<=1)
			
 
				-r: cache hit rate (0<=r<=1)
			
 
				-Q1: max qps of the server with 100% cache hit
			
 
				-Q2: max qps of the server with 0% cache hit
			
 
				-
			
 
				-Q1 can be measured easily for given data set; measuring Q2 is tricky
			
 
				-in general (it requires many external queries with unreliable
			
 
				-results), but we can still have some not-so-unrealistic numbers
			
 
				-through controlled simulation.
			
 
				-
			
 
				-As a data point for these values, see a previous experimental results
			
 
				-of mine:
			
 
				-https://lists.isc.org/pipermail/bind10-dev/2012-July/003628.html
			
 
				-
			
 
				-Looking at the "ideal" server implementation (no protocol overhead)
			
 
				-with the set up 90% and 85% cache hit rate with 1 recursion on cache
			
 
				-miss, and with the possible maximum total throughput, we can deduce
			
 
				-Q1 and Q2, which are: 170591qps and 60138qps respectively.
			
 
				-
			
 
				-This means, with 90% cache hit rate (r = 0.9), the server would spend
			
 
				-76% of its run time for receiving queries and answering responses
			
 
				-directly from the cache: 0.9*60138/(0.9*60138 + 0.1*170591) = 0.76.
			
 
				-
			
 
				-I also ran more realistic experiments: using BIND 9.9.2 and unbound
			
 
				-1.4.19 in the "forward only" mode with crafted query data and the
			
 
				-forwarded server to emulate the situation of 100% and 0% cache hit
			
 
				-rates.  I then measured the max response throughput using a
			
 
				-queryperf-like tool.  In both cases Q2 is about 28% of Q1 (I'm not
			
 
				-showing specific numbers to avoid unnecessary discussion about
			
 
				-specific performance of existing servers; it's out of scope of this
			
 
				-memo).  Using Q2 = 0.28*Q1, above equation with 90% cache hit rate
			
 
				-will be: A = 0.9 * 0.28 / (0.9*0.28 + 0.1) = 0.716. So the server will
			
 
				-spend about 72% of its running time to answer queries directly from
			
 
				-the cache.
			
 
				-
			
 
				-Of course, these experimental results are too simplified.  First, in
			
 
				-these experiments we assumed only one external query is needed on
			
 
				-cache miss.  In general it can be more; however, it may not actually
			
 
				-be too optimistic either: in my another research result:
			
 
				-http://bind10.isc.org/wiki/ResolverPerformanceResearch
			
 
				-In the more detailed analysis using real query sample and tracing what
			
 
				-an actual resolver would do, it looked we'd need about 1.44 to 1.63
			
 
				-external queries per cache miss in average.
			
 
				-
			
 
				-Still, of course, the real world cases are not that simple: in reality
			
 
				-we'd need to deal with timeouts, slower remote servers, unexpected
			
 
				-intermediate results, etc.  DNSSEC validating resolvers will clearly
			
 
				-need to do more work.
			
 
				-
			
 
				-So, in the real world deployment Q2 should be much smaller than Q1.
			
 
				-Here are some specific cases of the relationship between Q1 and Q2 for
			
 
				-given A (assuming r = 0.9):
			
 
				-
			
 
				-70%: Q2 = 0.26 * Q1
			
 
				-60%: Q2 = 0.17 * Q1
			
 
				-50%: Q2 = 0.11 * Q1
			
 
				-
			
 
				-So, even if "recursive resolution is 10 times heavier" than the cache
			
 
				-only case, we can assume the server spends a half of its run time for
			
 
				-answering queries directly from the cache at the cache hit rate of
			
 
				-90%.  I think this is a reasonably safe assumption.
			
 
				-
			
 
				-Now, assuming the number of 50% or more, does this suggest we should
			
 
				-highly optimize the cache?  Opinions may vary on this point, but I
			
 
				-personally think the answer is yes.  I've written an experimental
			
 
				-cache only implementation that employs the idea of fully-rendered
			
 
				-cached data.  On one test machine (2.20GHz AMD64, using a single
			
 
				-core), queryperf-like benchmark shows it can handle over 180Kqps,
			
 
				-while BIND 9.9.2 can just handle 41K qps.  The experimental
			
 
				-implementation skips some necessary features for a production server,
			
 
				-and cache management itself is always inevitable bottleneck, so the
			
 
				-production version wouldn't be that fast, but it still suggests it may
			
 
				-not be very difficult to reach over 100Kqps in production environment
			
 
				-including recursive resolution overhead.
			
 
				-
			
 
				-Cache Types
			
 
				------------
			
 
				-
			
 
				-1. Record cache
			
 
				-
			
 
				-Conceptually, any recursive resolver (with cache) implementation would
			
 
				-have cache for RRs (or RRsets in the modern version of protocol) given
			
 
				-in responses to its external queries.  In BIND 9, it's called the
			
 
				-"cached DB", using an in-memory rbt-like tree.  unbound calls it
			
 
				-"rrset cache", which is implemented as a hash table.
			
 
				-
			
 
				-2. Delegation cache
			
 
				-
			
 
				-Recursive server implementations would also have cache to determine
			
 
				-the deepest zone cut for a given query name in the recursion process.
			
 
				-Neither BIND 9 nor unbound has a separate cache for this purpose;
			
 
				-basically they try to find an NR RRset from the "record cache" whose
			
 
				-owner name best matches the given query name.
			
 
				-
			
 
				-3. Remote server cache
			
 
				-
			
 
				-In addition, a recursive server implementation may maintain a cache
			
 
				-for information of remote authoritative servers.  Both BIND 9 and
			
 
				-unbound conceptually have this type of cache, although there are some
			
 
				-non-negligible differences in details.  BIND 9's implementation of
			
 
				-this cache is called ADB.  Its a hash table whose key is domain name,
			
 
				-and each entry stores corresponding IPv6/v4 addresses; another data
			
 
				-structure for each address stores averaged RTT for the address,
			
 
				-lameness information, EDNS availability, etc.  unbound's
			
 
				-implementation is called "infrastructure cache".  It's a hash table
			
 
				-keyed with IP addresses whose entries store similar information as
			
 
				-that in BIND 9's per address ADB entry.  In unbound a remote server's
			
 
				-address must be determined by looking up the record cache (rrset cache
			
 
				-in unbound terminology); unlike BIND 9's ADB, there's no direct
			
 
				-shortcut from a server's domain name to IP addresses.
			
 
				-
			
 
				-4. Full response cache
			
 
				-
			
 
				-unbound has an additional cache layer, called the "message cache".
			
 
				-It's a hash table whose hash key is query parameter (essentially qname
			
 
				-and type) and entry is a sequence to record (rrset) cache entries.
			
 
				-This sequence constructs a complete response to the corresponding
			
 
				-query, so it would help optimize building a response message skipping
			
 
				-the record cache for each section (answer/authority/additional) of the
			
 
				-response message.  PowerDNS recursor has (seemingly) the same concept
			
 
				-called "packet cache" (but I don't know its implementation details
			
 
				-very much).
			
 
				-
			
 
				-BIND 9 doesn't have this type of cache; it always looks into the
			
 
				-record cache to build a complete response to a given query.
			
 
				-
			
 
				-Miscellaneous General Requirements
			
 
				-----------------------------------
			
 
				-
			
 
				-- Minimize contention between threads (if threaded)
			
 
				-- Cache purge policy: normally only a very small part of cached DNS
			
 
				-  information will be reused, and those reused are very heavily
			
 
				-  reused.  So LRU-like algorithm should generally work well, but we'll
			
 
				-  also need to honor DNS TTL.
			
 
				-
			
 
				-Random Ideas for BIND 10
			
 
				-------------------------
			
 
				-
			
 
				-Below are specific random ideas for BIND 10.  Some are based on
			
 
				-experimental results with reasonably realistic data; some others are
			
 
				-mostly a guess.
			
 
				-
			
 
				-1. Fully rendered response cache
			
 
				-
			
 
				-Some real world query samples show that a very small portion of entire
			
 
				-queries are very popular and queried very often and many times; the
			
 
				-rest is rarely reused, if any.  Two different data sets show top
			
 
				-10,000 queries would cover around 80% of total queries, regardless
			
 
				-of the size of the total queries.  This suggests an idea of having a
			
 
				-small, highly optimized full response cache.
			
 
				-
			
 
				-I tried this idea in the jinmei-l1cache branch.  It's a hash table
			
 
				-keyed with a tuple of query name and type whose entry stores fully
			
 
				-rendered, wire-format response image (answer section only, assuming
			
 
				-the "minimal-responses" option).  It also maintains offsets to each
			
 
				-RR, so it can easily update TTLs when necessary or rotate RRs if
			
 
				-optionally requested.  If neither TTL adjustment nor RR rotation is
			
 
				-required, query handling is just to lookup the hash table and copy the
			
 
				-pre-rendered data.  Experimental benchmark showed it ran vary fast;
			
 
				-more than 4 times faster than BIND 9, and even much faster than other
			
 
				-implementations that have full response cache (although, as usual, the
			
 
				-comparison is not entirely fair).
			
 
				-
			
 
				-Also, the cache size is quite small; the run time memory footprint of
			
 
				-this server process was just about 5MB.  So, I think it's reasonable
			
 
				-to have each process/thread have their own copy of this cache to
			
 
				-completely eliminate contention.  Also, if we can keep the cache size
			
 
				-this small, it would be easier to dump it to a file on shutdown and
			
 
				-reuse it on restart.  This will be quite effective (if the downtime is
			
 
				-reasonably short) because the cached data are expected to be highly
			
 
				-popular.
			
 
				-
			
 
				-2. Record cache
			
 
				-
			
 
				-For the normal record cache, I don't have a particular idea beyond
			
 
				-something obvious, like a hash table to map from query parameters to
			
 
				-corresponding RRset (or negative information).  But I guess this cache
			
 
				-should be shared by multiple threads.  That will help reconstruct the
			
 
				-full response cache data on TTL expiration more efficiently.  And, if
			
 
				-shared, the data structure should be chosen so that contention
			
 
				-overhead can be minimized.  In general, I guess something like hash
			
 
				-tables is more suitable than tree-like structure in that sense.
			
 
				-
			
 
				-There's other points to discuss for this cache related to other types
			
 
				-of cache (see below).
			
 
				-
			
 
				-3. Separate delegation cache
			
 
				-
			
 
				-One thing I'm guessing is that it may make sense if we have a separate
			
 
				-cache structure for delegation data.  It's conceptually a set of NS
			
 
				-RRs so we can identify the best (longest) matching one for a given
			
 
				-query name.
			
 
				-
			
 
				-Analysis of some sets of query data showed the vast majority of
			
 
				-end client's queries are for A and AAAA (not surprisingly).  So, even
			
 
				-if we separate this cache from the record cache, the additional
			
 
				-overhead (both for memory and fetch) will probably (hopefully) be
			
 
				-marginal.  Separating caches will also help reduce contention between
			
 
				-threads.  It *might* also help improve lookup performance because this
			
 
				-can be optimized for longest match search.
			
 
				-
			
 
				-4. Remote server cache without involving the record cache
			
 
				-
			
 
				-Likewise, it may make sense to maintain the remote server cache
			
 
				-separately from the record cache.  I guess these AAAA and A records
			
 
				-are rarely the queried by end clients, so, like the case of delegation
			
 
				-cache it's possible that the data sets are mostly disjoint.  Also, for
			
 
				-this purpose the RRsets don't have to have higher trust rank (per
			
 
				-RFC2181 5.4.1): glue or additional are okay, and, by separating these
			
 
				-from the record cache, we can avoid accidental promotion of these data
			
 
				-to trustworthy answers and returning them to clients (BIND 9 had this
			
 
				-type of bugs before).
			
 
				-
			
 
				-Custom vs Existing Library (STL etc)
			
 
				-------------------------------------
			
 
				-
			
 
				-It may have to be discussed, but I guess in many cases we end up
			
 
				-introducing custom implementation because these caches should be
			
 
				-highly performance sensitive, directly related our core business, and
			
 
				-also have to be memory efficient.  But in some sub components we may
			
 
				-be able to benefit from existing generic libraries.