Browse Source

[3006] Convert Jinmei's data source classes documentation to asciidoc

Mukund Sivaraman 12 years ago
parent
commit
df956ae5a6
1 changed files with 364 additions and 0 deletions
  1. 364 0
      doc/design/datasrc/data-source-classes.txt

+ 364 - 0
doc/design/datasrc/data-source-classes.txt

@@ -0,0 +1,364 @@
+Data Source Library Classes
+===========================
+
+About this document
+-------------------
+
+This memo describes major classes used in the data source library,
+mainly focusing on handling in-memory cache with consideration of the
+shared memory support.  It will give an overview of the entire design
+architecture and some specific details of how these classes are expected
+to be used.
+
+Before reading, the higher level inter-module protocol should be understood:
+http://bind10.isc.org/wiki/SharedMemoryIPC
+
+Overall relationships between classes
+-------------------------------------
+
+The following diagram shows major classes in the data source library
+related to in-memory caches and their relationship.
+
+image::overview.png[]
+
+Major design decisions of this architecture are:
+
+* Keep each class as concise as possible, each focusing on one or
+  small set of responsibilities.  Smaller classes are generally easier
+  to understand (at the cost of understanding how they work in the
+  "big picture" of course) and easier to test.
+
+* On a related point, minimize dependency to any single class.  A
+  monolithic class on which many others are dependent is generally
+  difficult to maintain because you'll need to ensure a change to the
+  monolithic class doesn't break anything on any other classes.
+
+* Use polymorphism for any "fluid" behavior, and hide specific details
+  under abstract interfaces so implementation details won't be
+  directly referenced from any other part of the library.
+  Specifically, the underlying memory segment type (local, mapped, and
+  possibly others) and the source of in-memory data (master file or
+  other data source) are hidden via a kind of polymorphism.
+
+* Separate classes directly used by applications from classes that
+  implement details.  Make the former classes as generic as possible,
+  agnostic about implementation specific details such as the memory
+  segment type (or, ideally and where possible, whether it's for
+  in-memory cache or the underlying data source).
+
+The following give a summarized description of these classes.
+
+* `ConfigurableClientList`: The front end to application classes.  An
+  application that uses the data source library generally maintains
+  one or more `ConfigurableClientList` object (usually one per RR
+  class, or when we support views, probably one per view).  This class
+  is a container of sets of data source related classes, providing
+  accessor to these classes and also acting as a factory of other
+  related class objects.  Note: Due to internal implementation
+  reasons, there is a base class for `ConfigurableClientList` named
+  `ClientList` in the C++ version, and applications are expected to
+  use the latter.  But conceptually `ConfigurableClientList` is an
+  independent value class; the inheritance is not for polymorphism.
+  Note also that the Python version doesn't have the base class.
+
+* `DataSourceInfo`: this is a straightforward tuple of set of class
+  objects corresponding to a single data source, including
+  `DataSourceClient`, `CacheConfig`, and `ZoneTableSegment`.
+  `ConfigurableClientList` maintains a list of `DataSourceInfo`, one
+  for each data source specified in its configuration.
+
+* `DataSourceClient`: The front end class to applications for a single
+  data source.  Applications will get a specific `DataSourceClient`
+  object by `ConfigurableClientList::find()`.
+  `DataSourceClient` itself is a set of factories for various
+  operations on the data source such as lookup or update.
+
+* `CacheConfig`: library internal representation of in-memory cache
+  configuration for a data source.  It knows which zones are to be
+  cached and where the zone data (RRs) should come from, either from a
+  master file or other data source.  With this knowledge it will
+  create an appropriate `LoadAction` object.  Note that `CacheConfig`
+  isn't aware of the underlying memory segment type for the in-memory
+  data.  It's intentionally separated from this class (see the
+  conciseness and minimal-dependency design decisions above).
+
+* `ZoneTableSegment`: when in-memory cache is enabled, it provides
+  memory-segment-type independent interface to the in-memory data.
+  This is an abstract base class (see polymorphism in the design
+  decisions) and inherited by segment-type specific subclasses:
+  `ZoneTableSegmentLocal` and `ZoneTableSegmentMapped` (and possibly
+  others).  Any subclass of `ZoneTableSegment` is expected to maintain
+  the specific type of `MemorySegment` object.
+
+* `ZoneWriter`: a frontend utility class for applications to update
+  in-memory zone data (currently it can only load a whole zone and
+  replace any existing zone content with a new one, but this should be
+  extended so it can handle partial updates).
+  Applications will get a specific `ZoneWriter`
+  object by `ConfigurableClientList::getCachedZoneWriter()`.
+  `ZoneWriter` is constructed with `ZoneableSegment` and `LoadAction`.
+  Since these are abstract classes, `ZoneWriter` doesn't have to be
+  aware of "fluid" details.  It's only responsible for "somehow" preparing
+  `ZoneData` for a new version of a specified zone using `LoadAction`,
+  and installing it in the `ZoneTable` (which can be accessed via
+  `ZoneTableSegment`).
+
+* `DataSourceStatus`: created by `ConfigurableClientList::getStatus()`,
+  a straightforward tuple that represents some status information of a
+  specific data source managed in the `ConfigurableClientList`.
+  `getStatus()` generates `DataSourceStatus` for all data sources
+  managed in it, and returns them as a vector.
+
+* `ZoneTableAccessor`, `ZoneTableIterator`: frontend classes to get
+  access to the conceptual "zone table" (a set of zones) stored in a
+  specific data source.  In particular, `ZoneTableIterator` allows
+  applications to iterate over all zones (by name) stored in the
+  specific data source.
+  Applications will get a specific `ZoneTableAccessor`
+  object by `ConfigurableClientList::getZoneTableAccessor()`,
+  and get an iterator object by calling `getIterator` on the accessor.
+  These are abstract classes and provide unified interfaces
+  independent from whether it's for in-memory cached zones or "real"
+  underlying data source.  But the initial implementation only
+  provides the in-memory cache version of subclass (see the next
+  item).
+
+* `ZoneTableAccessorCache`, `ZoneTableIteratorCache`: implementation
+  classes of `ZoneTableAccessor` and `ZoneTableIterator` for in-memory
+  cache.  They refer to `CacheConfig` to get a list of zones to be
+  cached.
+
+* `ZoneTableHeader`, `ZoneTable`: top-level interface to actual
+  in-memory data.  These were separated based on a prior version of
+  the design (http://bind10.isc.org/wiki/ScalableZoneLoadDesign) where
+  `ZoneTableHeader` may contain multiple `ZoneTable`s.  It's
+  one-to-one relationship in the latest version (of implementation),
+  so we could probably unify them as a cleanup.
+
+* `ZoneData`: representing the in-memory content of a single zone.
+  `ZoneTable` contains (zero, one or) multiple `ZoneData` objects.
+
+* `RdataSet`: representing the in-memory content of (data of) a single
+  RRset.
+  `ZoneData` contains `RdataSet`s corresponding to the RRsets stored
+  in the zone.
+
+* `LoadAction`: a "polymorphic" functor that implements loading zone
+  data into memory.  It hides from its user (i.e., `ZoneWriter`)
+  details about the source of the data: master file or other data
+  source (and perhaps some others).  The "polymorphism" is actually
+  realized as different implementations of the functor interface, not
+  class inheritance (but conceptually the effect and goal is the
+  same).  Note: there's a proposal to replace `LoadAction` with
+  a revised `ZoneDataLoader`, although the overall concept doesn't
+  change.  See Trac ticket #2912.
+
+* `ZoneDataLoader` and `ZoneDataUpdater`: helper classes for the
+  `LoadAction` functor(s).  These work independently from the source
+  of data, taking a sequence of RRsets objects, converting them
+  into the in-memory data structures (`RdataSet`), and installing them
+  into a newly created `ZoneData` object.
+
+Sequence for auth module using local memory segment
+---------------------------------------------------
+
+In the remaining sections, we explain how the classes shown in the
+previous section work together through their methods for commonly
+intended operations.
+
+The following sequence diagram shows the case for the authoritative
+DNS server module to maintain "local" in-memory data.  Note that
+"auth" is a conceptual "class" (not actually implemented as a C++
+class) to represent the server application behavior.  For the purpose
+of this document that should be sufficient.  The same note applies to
+all examples below.
+
+image::auth-local.png[]
+
+1. On startup, the auth module creates a `ConfigurableClientList`
+   for each RR class specified in the configuration for "data_sources"
+   module.  It then calls `ConfigurableClientList::configure()`
+   for the given configuration of that RR class.
+
+2. For each data source, `ConfigurableClientList` creates a
+   `CacheConfig` object with the corresponding cache related
+   configuration.
+
+3. If in-memory cache is enabled for the data source,
+   `ZoneTableSegment` is also created.  In this scenario the cache
+   type is specified as "local" in the configuration, so a functor
+   creates `ZoneTableSegmentLocal` as the actual instance.
+   In this case its `ZoneTable` is immediately created, too.
+
+4. `ConfigurableClientList` checks if the created `ZoneTableSegment` is
+   writable.  It is always so for "local" type of segments.  So
+   `ConfigurableClientList` immediately loads zones to be cached into
+   memory.  For each such zone, it first gets the appropriate
+   `LoadAction` through `CacheConfig`, then creates `ZoneWriter` with
+   the `LoadAction`, and loads the data using the writer.
+
+5. If the auth module receives a "reload" command for a cached zone
+   from other module (xfrin, an end user, etc), it calls
+   `ConfigurableClientList::getCachedZoneWriter` to load and install
+   the new version of the zone.  The same loading sequence takes place
+   except that the user of the writer is the auth module.
+   Also, the old version of the zone data is destroyed at the end of
+   the process.
+
+Sequence for auth module using mapped memory segment
+----------------------------------------------------
+
+This is an example for the authoritative server module that uses
+mapped type memory segment for in-memory data.
+
+image::auth-mapped.png[]
+
+1. The sequence is the same to the point of creating `CacheConfig`.
+
+2. But in this case a `ZoneTableSegmentMapped` object is created based
+   on the configuration for the cache type.  This type of
+   `ZoneTableSegment` is initially empty, and isn't even associated
+   with a `MemorySgement` (and therefore considered "non writable").
+
+3. `ConfigurableClientList` checks if the zone table segment is
+   writable to know whether to load zones into memory by itself,
+   but since `ZoneTableSegment::isWritable()` returns false, it skips
+   the loading.
+
+4. the auth module gets the status of each data source, and notices
+   there's a "WAITING" state of segment.  So it subscribes to the
+   "Memmgr" group on a command session and waits for an update
+   from the memory manager (memmgr) module. (See also the note at the
+   end of the section)
+
+5. when the auth module receives an update command from memmgr, it
+   calls `ConfigurableClientList::resetMemorySegment()` with the command
+   argument and the segment mode of "READ_ONLY".
+   Note that the auth module handles the command argument as mostly
+   opaque data; it's not expected to be details of segment
+   type-specific behavior.
+
+6. `ConfigurableClientList::resetMemorySegment()` subsequently calls
+   `reset()` method on the corresponding `ZoneTableSegment` with the
+   given parameters.
+   In the case of `ZoneTableSegmentMapped`, it creates a new
+   `MemorySegment` object for the mapped type, which internally maps
+   the specific file into memory.
+   memmgr is expected to have prepared all necessary data in the file,
+   so all the data are immediately ready for use (i.e., there
+   shouldn't be no explicit load operation).
+
+7. When a change is made in the mapped data, memmgr will send another
+   update command with parameters for new mapping.  The auth module
+   calls `ConfigurableClientList::resetMemorySegment()`, and the
+   underlying memory segment is swapped with a new one.  The old
+   memory segment object is destroyed at this point.  Note that
+   this "destroy" just means unmapping the memory region; the data
+   stored in the file are intact.
+
+8. If the auth module happens to receive a reload command from other
+   module, it could call
+   `ConfigurableClientList::getCachedZoneWriter()`
+   to reload the data by itself, just like in the previous section.
+   In this case, however, the writability check of
+   `getCachedZoneWriter()` fails (the segment was created as
+   READ_ONLY, so is considered non writable), so loading won't happen.
+
+Note: while less likely in practice, it's possible that the same auth
+module uses both "local" and "mapped" (and even others) type of
+segments for different data sources.  In such cases the sequence is
+either the one in this or previous section depending on the specified
+segment type in the configuration.  The auth module itself isn't aware
+of per segment-type details, but change the behavior depending on the
+segment state of each data source at step 4 above: if it's "WAITING",
+it means the auth module needs help from memmgr (that's all the auth
+module should know; it shouldn't be bothered with further details such
+as mapped file names); if it's something else, the auth module doesn't
+have to do anything further.
+
+Sequence for memmgr module initialization using mapped memory segment
+---------------------------------------------------------------------
+
+This sequence shows the common initialization sequence for the
+memory manager (memmgr) module using a mapped type memory segment.
+This is a mixture of the sequences shown in Sections 2 and 3.
+
+image::memmgr-mapped-init.png[]
+
+1. Initial sequence is the same until the application module (memmgr)
+   calls `ConfigurableClientList::getStatus()` as that for the
+   previous section.
+
+2. The memmgr module identifies the data sources whose in-memory cache
+   type is "mapped".  (Unlike other application modules, the memmgr
+   should know what such types means due to its exact responsibility).
+   For each such data source, it calls
+   `ConfigurableClientList::resetMemorySegment` with the READ_WRITE
+   mode and other mapped-type specific parameters.  memmgr should be
+   able to generate the parameters from its own configuration and
+   other data source specific information (such as the RR class and
+   data source name).
+
+3. The `ConfigurableClientList` class calls
+   `ZoneTableSegment::reset()` on the corresponding zone table
+   segment with the given parameters.  In this case, since the mode is
+   READ_WRITE, a new `ZoneTable` will be created (assuming this is a
+   very first time initialization; if there's already a zone table
+   in the segment, it will be used).
+
+4. The memmgr module then calls
+   `ConfigurableClientList::getZoneTableAccessor()`, and calls the
+   `getItertor()` method on it to get a list of zones for which
+   zone data are to be loaded into the memory segment.
+
+5. The memmgr module loads the zone data for each such zone.  This
+   sequence is the same as shown in Section 2.
+
+6. On loading all zone data, the memmgr module sends an update command
+   to all interested modules (such as auth) in the segment, and waits
+   for acknowledgment from all of them.
+
+7. Then it calls `ConfigurableClientList::resetMemorySegment()` for
+   this data source with almost the same parameter as step 2 above,
+   but with a different mapped file name.  This will make a swap of
+   the underlying memory segment with a new mapping.  The old
+   `MemorySegment` object will be destroyed, but as explained in the
+   previous section, it simply means unmapping the file.
+
+8. The memmgr loads the zone data into the newly mapped memory region
+   by repeating the sequence shown in step 5.
+
+9. The memmgr repeats all this sequence for data sources that use
+   "mapped" segment for in-memory cache.  Note: it could handle
+   multiple data sources in parallel, e.g., while waiting for
+   acknowledgment from other modules.
+
+Sequence for memmgr module to reload a zone using mapped memory segment
+-----------------------------------------------------------------------
+
+This example is a continuation of the previous section, describing how
+the memory manager reloads a zone in mapped memory segment.
+
+image::memmgr-mapped-reload.png[]
+
+1. When the memmgr module receives a reload command from other module,
+   it calls `ConfigurableClientList::getCachedZoneWriter()` for the
+   specified zone name.  This method checks the writability of
+   the segment, and since it's writable (as memmgr created it in the
+   READ_WRITE mode), `getCachedZoneWriter()` succeeds and returns
+   a `ZoneWriter`.
+
+2. The memmgr module uses the writer to load the new version of zone
+   data.  There is nothing specific to mapped-type segment here.
+
+3. The memmgr module then sends an update command to other modules
+   that would share this version, and waits for acknowledgment from
+   all of them.
+
+4. On getting acknowledgments, the memmgr module calls
+  `ConfigurableClientList::resetMemorySegment()` with the parameter
+   specifying the other mapped file.  This will swap the underlying
+   `MemorySegment` with a newly created one, mapping the other file.
+
+5. The memmgr updates this segment, too, so the two files will contain
+   the same version of data.