12 years ago · df956ae5a6
--- a/doc/design/datasrc/data-source-classes.txt
+++ b/doc/design/datasrc/data-source-classes.txt
@@ -0,0 +1,364 @@
 
				+Data Source Library Classes
			
 
				+===========================
			
 
				+
			
 
				+About this document
			
 
				+-------------------
			
 
				+
			
 
				+This memo describes major classes used in the data source library,
			
 
				+mainly focusing on handling in-memory cache with consideration of the
			
 
				+shared memory support.  It will give an overview of the entire design
			
 
				+architecture and some specific details of how these classes are expected
			
 
				+to be used.
			
 
				+
			
 
				+Before reading, the higher level inter-module protocol should be understood:
			
 
				+http://bind10.isc.org/wiki/SharedMemoryIPC
			
 
				+
			
 
				+Overall relationships between classes
			
 
				+-------------------------------------
			
 
				+
			
 
				+The following diagram shows major classes in the data source library
			
 
				+related to in-memory caches and their relationship.
			
 
				+
			
 
				+image::overview.png[]
			
 
				+
			
 
				+Major design decisions of this architecture are:
			
 
				+
			
 
				+* Keep each class as concise as possible, each focusing on one or
			
 
				+  small set of responsibilities.  Smaller classes are generally easier
			
 
				+  to understand (at the cost of understanding how they work in the
			
 
				+  "big picture" of course) and easier to test.
			
 
				+
			
 
				+* On a related point, minimize dependency to any single class.  A
			
 
				+  monolithic class on which many others are dependent is generally
			
 
				+  difficult to maintain because you'll need to ensure a change to the
			
 
				+  monolithic class doesn't break anything on any other classes.
			
 
				+
			
 
				+* Use polymorphism for any "fluid" behavior, and hide specific details
			
 
				+  under abstract interfaces so implementation details won't be
			
 
				+  directly referenced from any other part of the library.
			
 
				+  Specifically, the underlying memory segment type (local, mapped, and
			
 
				+  possibly others) and the source of in-memory data (master file or
			
 
				+  other data source) are hidden via a kind of polymorphism.
			
 
				+
			
 
				+* Separate classes directly used by applications from classes that
			
 
				+  implement details.  Make the former classes as generic as possible,
			
 
				+  agnostic about implementation specific details such as the memory
			
 
				+  segment type (or, ideally and where possible, whether it's for
			
 
				+  in-memory cache or the underlying data source).
			
 
				+
			
 
				+The following give a summarized description of these classes.
			
 
				+
			
 
				+* `ConfigurableClientList`: The front end to application classes.  An
			
 
				+  application that uses the data source library generally maintains
			
 
				+  one or more `ConfigurableClientList` object (usually one per RR
			
 
				+  class, or when we support views, probably one per view).  This class
			
 
				+  is a container of sets of data source related classes, providing
			
 
				+  accessor to these classes and also acting as a factory of other
			
 
				+  related class objects.  Note: Due to internal implementation
			
 
				+  reasons, there is a base class for `ConfigurableClientList` named
			
 
				+  `ClientList` in the C++ version, and applications are expected to
			
 
				+  use the latter.  But conceptually `ConfigurableClientList` is an
			
 
				+  independent value class; the inheritance is not for polymorphism.
			
 
				+  Note also that the Python version doesn't have the base class.
			
 
				+
			
 
				+* `DataSourceInfo`: this is a straightforward tuple of set of class
			
 
				+  objects corresponding to a single data source, including
			
 
				+  `DataSourceClient`, `CacheConfig`, and `ZoneTableSegment`.
			
 
				+  `ConfigurableClientList` maintains a list of `DataSourceInfo`, one
			
 
				+  for each data source specified in its configuration.
			
 
				+
			
 
				+* `DataSourceClient`: The front end class to applications for a single
			
 
				+  data source.  Applications will get a specific `DataSourceClient`
			
 
				+  object by `ConfigurableClientList::find()`.
			
 
				+  `DataSourceClient` itself is a set of factories for various
			
 
				+  operations on the data source such as lookup or update.
			
 
				+
			
 
				+* `CacheConfig`: library internal representation of in-memory cache
			
 
				+  configuration for a data source.  It knows which zones are to be
			
 
				+  cached and where the zone data (RRs) should come from, either from a
			
 
				+  master file or other data source.  With this knowledge it will
			
 
				+  create an appropriate `LoadAction` object.  Note that `CacheConfig`
			
 
				+  isn't aware of the underlying memory segment type for the in-memory
			
 
				+  data.  It's intentionally separated from this class (see the
			
 
				+  conciseness and minimal-dependency design decisions above).
			
 
				+
			
 
				+* `ZoneTableSegment`: when in-memory cache is enabled, it provides
			
 
				+  memory-segment-type independent interface to the in-memory data.
			
 
				+  This is an abstract base class (see polymorphism in the design
			
 
				+  decisions) and inherited by segment-type specific subclasses:
			
 
				+  `ZoneTableSegmentLocal` and `ZoneTableSegmentMapped` (and possibly
			
 
				+  others).  Any subclass of `ZoneTableSegment` is expected to maintain
			
 
				+  the specific type of `MemorySegment` object.
			
 
				+
			
 
				+* `ZoneWriter`: a frontend utility class for applications to update
			
 
				+  in-memory zone data (currently it can only load a whole zone and
			
 
				+  replace any existing zone content with a new one, but this should be
			
 
				+  extended so it can handle partial updates).
			
 
				+  Applications will get a specific `ZoneWriter`
			
 
				+  object by `ConfigurableClientList::getCachedZoneWriter()`.
			
 
				+  `ZoneWriter` is constructed with `ZoneableSegment` and `LoadAction`.
			
 
				+  Since these are abstract classes, `ZoneWriter` doesn't have to be
			
 
				+  aware of "fluid" details.  It's only responsible for "somehow" preparing
			
 
				+  `ZoneData` for a new version of a specified zone using `LoadAction`,
			
 
				+  and installing it in the `ZoneTable` (which can be accessed via
			
 
				+  `ZoneTableSegment`).
			
 
				+
			
 
				+* `DataSourceStatus`: created by `ConfigurableClientList::getStatus()`,
			
 
				+  a straightforward tuple that represents some status information of a
			
 
				+  specific data source managed in the `ConfigurableClientList`.
			
 
				+  `getStatus()` generates `DataSourceStatus` for all data sources
			
 
				+  managed in it, and returns them as a vector.
			
 
				+
			
 
				+* `ZoneTableAccessor`, `ZoneTableIterator`: frontend classes to get
			
 
				+  access to the conceptual "zone table" (a set of zones) stored in a
			
 
				+  specific data source.  In particular, `ZoneTableIterator` allows
			
 
				+  applications to iterate over all zones (by name) stored in the
			
 
				+  specific data source.
			
 
				+  Applications will get a specific `ZoneTableAccessor`
			
 
				+  object by `ConfigurableClientList::getZoneTableAccessor()`,
			
 
				+  and get an iterator object by calling `getIterator` on the accessor.
			
 
				+  These are abstract classes and provide unified interfaces
			
 
				+  independent from whether it's for in-memory cached zones or "real"
			
 
				+  underlying data source.  But the initial implementation only
			
 
				+  provides the in-memory cache version of subclass (see the next
			
 
				+  item).
			
 
				+
			
 
				+* `ZoneTableAccessorCache`, `ZoneTableIteratorCache`: implementation
			
 
				+  classes of `ZoneTableAccessor` and `ZoneTableIterator` for in-memory
			
 
				+  cache.  They refer to `CacheConfig` to get a list of zones to be
			
 
				+  cached.
			
 
				+
			
 
				+* `ZoneTableHeader`, `ZoneTable`: top-level interface to actual
			
 
				+  in-memory data.  These were separated based on a prior version of
			
 
				+  the design (http://bind10.isc.org/wiki/ScalableZoneLoadDesign) where
			
 
				+  `ZoneTableHeader` may contain multiple `ZoneTable`s.  It's
			
 
				+  one-to-one relationship in the latest version (of implementation),
			
 
				+  so we could probably unify them as a cleanup.
			
 
				+
			
 
				+* `ZoneData`: representing the in-memory content of a single zone.
			
 
				+  `ZoneTable` contains (zero, one or) multiple `ZoneData` objects.
			
 
				+
			
 
				+* `RdataSet`: representing the in-memory content of (data of) a single
			
 
				+  RRset.
			
 
				+  `ZoneData` contains `RdataSet`s corresponding to the RRsets stored
			
 
				+  in the zone.
			
 
				+
			
 
				+* `LoadAction`: a "polymorphic" functor that implements loading zone
			
 
				+  data into memory.  It hides from its user (i.e., `ZoneWriter`)
			
 
				+  details about the source of the data: master file or other data
			
 
				+  source (and perhaps some others).  The "polymorphism" is actually
			
 
				+  realized as different implementations of the functor interface, not
			
 
				+  class inheritance (but conceptually the effect and goal is the
			
 
				+  same).  Note: there's a proposal to replace `LoadAction` with
			
 
				+  a revised `ZoneDataLoader`, although the overall concept doesn't
			
 
				+  change.  See Trac ticket #2912.
			
 
				+
			
 
				+* `ZoneDataLoader` and `ZoneDataUpdater`: helper classes for the
			
 
				+  `LoadAction` functor(s).  These work independently from the source
			
 
				+  of data, taking a sequence of RRsets objects, converting them
			
 
				+  into the in-memory data structures (`RdataSet`), and installing them
			
 
				+  into a newly created `ZoneData` object.
			
 
				+
			
 
				+Sequence for auth module using local memory segment
			
 
				+---------------------------------------------------
			
 
				+
			
 
				+In the remaining sections, we explain how the classes shown in the
			
 
				+previous section work together through their methods for commonly
			
 
				+intended operations.
			
 
				+
			
 
				+The following sequence diagram shows the case for the authoritative
			
 
				+DNS server module to maintain "local" in-memory data.  Note that
			
 
				+"auth" is a conceptual "class" (not actually implemented as a C++
			
 
				+class) to represent the server application behavior.  For the purpose
			
 
				+of this document that should be sufficient.  The same note applies to
			
 
				+all examples below.
			
 
				+
			
 
				+image::auth-local.png[]
			
 
				+
			
 
				+1. On startup, the auth module creates a `ConfigurableClientList`
			
 
				+   for each RR class specified in the configuration for "data_sources"
			
 
				+   module.  It then calls `ConfigurableClientList::configure()`
			
 
				+   for the given configuration of that RR class.
			
 
				+
			
 
				+2. For each data source, `ConfigurableClientList` creates a
			
 
				+   `CacheConfig` object with the corresponding cache related
			
 
				+   configuration.
			
 
				+
			
 
				+3. If in-memory cache is enabled for the data source,
			
 
				+   `ZoneTableSegment` is also created.  In this scenario the cache
			
 
				+   type is specified as "local" in the configuration, so a functor
			
 
				+   creates `ZoneTableSegmentLocal` as the actual instance.
			
 
				+   In this case its `ZoneTable` is immediately created, too.
			
 
				+
			
 
				+4. `ConfigurableClientList` checks if the created `ZoneTableSegment` is
			
 
				+   writable.  It is always so for "local" type of segments.  So
			
 
				+   `ConfigurableClientList` immediately loads zones to be cached into
			
 
				+   memory.  For each such zone, it first gets the appropriate
			
 
				+   `LoadAction` through `CacheConfig`, then creates `ZoneWriter` with
			
 
				+   the `LoadAction`, and loads the data using the writer.
			
 
				+
			
 
				+5. If the auth module receives a "reload" command for a cached zone
			
 
				+   from other module (xfrin, an end user, etc), it calls
			
 
				+   `ConfigurableClientList::getCachedZoneWriter` to load and install
			
 
				+   the new version of the zone.  The same loading sequence takes place
			
 
				+   except that the user of the writer is the auth module.
			
 
				+   Also, the old version of the zone data is destroyed at the end of
			
 
				+   the process.
			
 
				+
			
 
				+Sequence for auth module using mapped memory segment
			
 
				+----------------------------------------------------
			
 
				+
			
 
				+This is an example for the authoritative server module that uses
			
 
				+mapped type memory segment for in-memory data.
			
 
				+
			
 
				+image::auth-mapped.png[]
			
 
				+
			
 
				+1. The sequence is the same to the point of creating `CacheConfig`.
			
 
				+
			
 
				+2. But in this case a `ZoneTableSegmentMapped` object is created based
			
 
				+   on the configuration for the cache type.  This type of
			
 
				+   `ZoneTableSegment` is initially empty, and isn't even associated
			
 
				+   with a `MemorySgement` (and therefore considered "non writable").
			
 
				+
			
 
				+3. `ConfigurableClientList` checks if the zone table segment is
			
 
				+   writable to know whether to load zones into memory by itself,
			
 
				+   but since `ZoneTableSegment::isWritable()` returns false, it skips
			
 
				+   the loading.
			
 
				+
			
 
				+4. the auth module gets the status of each data source, and notices
			
 
				+   there's a "WAITING" state of segment.  So it subscribes to the
			
 
				+   "Memmgr" group on a command session and waits for an update
			
 
				+   from the memory manager (memmgr) module. (See also the note at the
			
 
				+   end of the section)
			
 
				+
			
 
				+5. when the auth module receives an update command from memmgr, it
			
 
				+   calls `ConfigurableClientList::resetMemorySegment()` with the command
			
 
				+   argument and the segment mode of "READ_ONLY".
			
 
				+   Note that the auth module handles the command argument as mostly
			
 
				+   opaque data; it's not expected to be details of segment
			
 
				+   type-specific behavior.
			
 
				+
			
 
				+6. `ConfigurableClientList::resetMemorySegment()` subsequently calls
			
 
				+   `reset()` method on the corresponding `ZoneTableSegment` with the
			
 
				+   given parameters.
			
 
				+   In the case of `ZoneTableSegmentMapped`, it creates a new
			
 
				+   `MemorySegment` object for the mapped type, which internally maps
			
 
				+   the specific file into memory.
			
 
				+   memmgr is expected to have prepared all necessary data in the file,
			
 
				+   so all the data are immediately ready for use (i.e., there
			
 
				+   shouldn't be no explicit load operation).
			
 
				+
			
 
				+7. When a change is made in the mapped data, memmgr will send another
			
 
				+   update command with parameters for new mapping.  The auth module
			
 
				+   calls `ConfigurableClientList::resetMemorySegment()`, and the
			
 
				+   underlying memory segment is swapped with a new one.  The old
			
 
				+   memory segment object is destroyed at this point.  Note that
			
 
				+   this "destroy" just means unmapping the memory region; the data
			
 
				+   stored in the file are intact.
			
 
				+
			
 
				+8. If the auth module happens to receive a reload command from other
			
 
				+   module, it could call
			
 
				+   `ConfigurableClientList::getCachedZoneWriter()`
			
 
				+   to reload the data by itself, just like in the previous section.
			
 
				+   In this case, however, the writability check of
			
 
				+   `getCachedZoneWriter()` fails (the segment was created as
			
 
				+   READ_ONLY, so is considered non writable), so loading won't happen.
			
 
				+
			
 
				+Note: while less likely in practice, it's possible that the same auth
			
 
				+module uses both "local" and "mapped" (and even others) type of
			
 
				+segments for different data sources.  In such cases the sequence is
			
 
				+either the one in this or previous section depending on the specified
			
 
				+segment type in the configuration.  The auth module itself isn't aware
			
 
				+of per segment-type details, but change the behavior depending on the
			
 
				+segment state of each data source at step 4 above: if it's "WAITING",
			
 
				+it means the auth module needs help from memmgr (that's all the auth
			
 
				+module should know; it shouldn't be bothered with further details such
			
 
				+as mapped file names); if it's something else, the auth module doesn't
			
 
				+have to do anything further.
			
 
				+
			
 
				+Sequence for memmgr module initialization using mapped memory segment
			
 
				+---------------------------------------------------------------------
			
 
				+
			
 
				+This sequence shows the common initialization sequence for the
			
 
				+memory manager (memmgr) module using a mapped type memory segment.
			
 
				+This is a mixture of the sequences shown in Sections 2 and 3.
			
 
				+
			
 
				+image::memmgr-mapped-init.png[]
			
 
				+
			
 
				+1. Initial sequence is the same until the application module (memmgr)
			
 
				+   calls `ConfigurableClientList::getStatus()` as that for the
			
 
				+   previous section.
			
 
				+
			
 
				+2. The memmgr module identifies the data sources whose in-memory cache
			
 
				+   type is "mapped".  (Unlike other application modules, the memmgr
			
 
				+   should know what such types means due to its exact responsibility).
			
 
				+   For each such data source, it calls
			
 
				+   `ConfigurableClientList::resetMemorySegment` with the READ_WRITE
			
 
				+   mode and other mapped-type specific parameters.  memmgr should be
			
 
				+   able to generate the parameters from its own configuration and
			
 
				+   other data source specific information (such as the RR class and
			
 
				+   data source name).
			
 
				+
			
 
				+3. The `ConfigurableClientList` class calls
			
 
				+   `ZoneTableSegment::reset()` on the corresponding zone table
			
 
				+   segment with the given parameters.  In this case, since the mode is
			
 
				+   READ_WRITE, a new `ZoneTable` will be created (assuming this is a
			
 
				+   very first time initialization; if there's already a zone table
			
 
				+   in the segment, it will be used).
			
 
				+
			
 
				+4. The memmgr module then calls
			
 
				+   `ConfigurableClientList::getZoneTableAccessor()`, and calls the
			
 
				+   `getItertor()` method on it to get a list of zones for which
			
 
				+   zone data are to be loaded into the memory segment.
			
 
				+
			
 
				+5. The memmgr module loads the zone data for each such zone.  This
			
 
				+   sequence is the same as shown in Section 2.
			
 
				+
			
 
				+6. On loading all zone data, the memmgr module sends an update command
			
 
				+   to all interested modules (such as auth) in the segment, and waits
			
 
				+   for acknowledgment from all of them.
			
 
				+
			
 
				+7. Then it calls `ConfigurableClientList::resetMemorySegment()` for
			
 
				+   this data source with almost the same parameter as step 2 above,
			
 
				+   but with a different mapped file name.  This will make a swap of
			
 
				+   the underlying memory segment with a new mapping.  The old
			
 
				+   `MemorySegment` object will be destroyed, but as explained in the
			
 
				+   previous section, it simply means unmapping the file.
			
 
				+
			
 
				+8. The memmgr loads the zone data into the newly mapped memory region
			
 
				+   by repeating the sequence shown in step 5.
			
 
				+
			
 
				+9. The memmgr repeats all this sequence for data sources that use
			
 
				+   "mapped" segment for in-memory cache.  Note: it could handle
			
 
				+   multiple data sources in parallel, e.g., while waiting for
			
 
				+   acknowledgment from other modules.
			
 
				+
			
 
				+Sequence for memmgr module to reload a zone using mapped memory segment
			
 
				+-----------------------------------------------------------------------
			
 
				+
			
 
				+This example is a continuation of the previous section, describing how
			
 
				+the memory manager reloads a zone in mapped memory segment.
			
 
				+
			
 
				+image::memmgr-mapped-reload.png[]
			
 
				+
			
 
				+1. When the memmgr module receives a reload command from other module,
			
 
				+   it calls `ConfigurableClientList::getCachedZoneWriter()` for the
			
 
				+   specified zone name.  This method checks the writability of
			
 
				+   the segment, and since it's writable (as memmgr created it in the
			
 
				+   READ_WRITE mode), `getCachedZoneWriter()` succeeds and returns
			
 
				+   a `ZoneWriter`.
			
 
				+
			
 
				+2. The memmgr module uses the writer to load the new version of zone
			
 
				+   data.  There is nothing specific to mapped-type segment here.
			
 
				+
			
 
				+3. The memmgr module then sends an update command to other modules
			
 
				+   that would share this version, and waits for acknowledgment from
			
 
				+   all of them.
			
 
				+
			
 
				+4. On getting acknowledgments, the memmgr module calls
			
 
				+  `ConfigurableClientList::resetMemorySegment()` with the parameter
			
 
				+   specifying the other mapped file.  This will swap the underlying
			
 
				+   `MemorySegment` with a newly created one, mapping the other file.
			
 
				+
			
 
				+5. The memmgr updates this segment, too, so the two files will contain
			
 
				+   the same version of data.