Browse Source

added some analysis on cache effectiveness.

JINMEI Tatuya 12 years ago
parent
commit
fb3a565eed
1 changed files with 87 additions and 0 deletions
  1. 87 0
      doc/design/resolver/03-cache-algorithm.txt

+ 87 - 0
doc/design/resolver/03-cache-algorithm.txt

@@ -20,3 +20,90 @@ which algorithm is best for that. If it is very critical, then a
 custom algorithm designed for DNS caching makes sense. If it is not,
 then we can consider using an STL-based data structure.
 
+Effectiveness of Cache
+----------------------
+
+First, I'll try to answer the introductory questions.
+
+In some simplified model, we can express the amount of running time
+for answering queries directly from the cache in the total running
+time including that used for recursive resolution due to cache miss as
+follows:
+
+A = r*Q2*/(r*Q2+ Q1*(1-r))
+where
+A: amount of time for answering queries from the cache per unit time
+   (such as sec, 0<=A<=1)
+r: cache hit rate (0<=r<=1)
+Q1: max qps of the server with 100% cache hit
+Q2: max qps of the server with 0% cache hit
+
+Q1 can be measured easily for given data set; measuring Q2 is tricky
+in general (it requires many external queries with unreliable
+results), but we can still have some not-so-unrealistic numbers
+through controlled simulation.
+
+As a data point for these values, see a previous experimental results
+of mine:
+https://lists.isc.org/pipermail/bind10-dev/2012-July/003628.html
+
+Looking at the "ideal" server implementation (no protocol overhead)
+with the set up 90% and 85% cache hit rate with 1 recursion on cache
+miss, and with the possible maximum total throughput, we can deduce
+Q1 and Q2, which are: 170591qps and 60138qps respectively.
+
+This means, with 90% cache hit rate (r = 0.9), the server would spend
+76% of its run time for receiving queries and answering responses
+directly from the cache: 0.9*60138/(0.9*60138 + 0.1*170591) = 0.76.
+
+I also ran more realistic experiments: using BIND 9.9.2 and unbound
+1.4.19 in the "forward only" mode with crafted query data and the
+forwarded server to emulate the situation of 100% and 0% cache hit
+rates.  I then measured the max response throughput using a
+queryperf-like tool.  In both cases Q2 is about 28% of Q1 (I'm not
+showing specific numbers to avoid unnecessary discussion about
+specific performance of existing servers; it's out of scope of this
+memo).  Using Q2 = 0.28*Q1, above equation with 90% cache hit rate
+will be: A = 0.9 * 0.28 / (0.9*0.28 + 0.1) = 0.716. So the server will
+spend about 72% of its running time to answer queries directly from
+the cache.
+
+Of course, these experimental results are too simplified.  First, in
+these experiments we assumed only one external query is needed on
+cache miss.  In general it can be more; however, it may not actually
+too optimistic either: in my another research result:
+http://bind10.isc.org/wiki/ResolverPerformanceResearch
+In the more detailed analysis using real query sample and tracing what
+an actual resolver would do, it looked we'd need about 1.44 to 1.63
+external queries per cache miss in average.
+
+Still, of course, the real world cases are not that simple: in reality
+we'd need to deal with timeouts, slower remote servers, unexpected
+intermediate results, etc.  DNSSEC validating resolvers will clearly
+need to do more work.
+
+So, in the real world deployment Q2 should be much smaller than Q1.
+Here are some specific cases of the relationship between Q1 and Q2 for
+given A (assuming r = 0.9):
+
+70%: Q2 = 0.26 * Q1
+60%: Q2 = 0.17 * Q1
+50%: Q2 = 0.11 * Q1
+
+So, even if "recursive resolution is 10 times heavier" than the cache
+only case, we can assume the server spends a half of its run time for
+answering queries directly from the cache at the cache hit rate of
+90%.  I think this is a reasonably safe assumption.
+
+Now, assuming the number of 50% or more, does this suggest we should
+highly optimize the cache?  Opinions may vary on this point, but I
+personally think the answer is yes.  I've written an experimental
+cache only implementation that employs the idea of fully-rendered
+cached data.  On one test machine (2.20GHz AMD64, using a single
+core), queryperf-like benchmark shows it can handle over 180Kqps,
+while BIND 9.9.2 can just handle 41K qps.  The experimental
+implementation skips some necessary features for a production server,
+and cache management itself is always inevitable bottleneck, so the
+production version wouldn't be that fast, but it still suggests it may
+not be very difficult to reach over 100Kqps in production environment
+including recursive resolution overhead.