data-source-classes.txt 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364
  1. Data Source Library Classes
  2. ===========================
  3. About this document
  4. -------------------
  5. This memo describes major classes used in the data source library,
  6. mainly focusing on handling in-memory cache with consideration of the
  7. shared memory support. It will give an overview of the entire design
  8. architecture and some specific details of how these classes are expected
  9. to be used.
  10. Before reading, the higher level inter-module protocol should be understood:
  11. http://bind10.isc.org/wiki/SharedMemoryIPC
  12. Overall relationships between classes
  13. -------------------------------------
  14. The following diagram shows major classes in the data source library
  15. related to in-memory caches and their relationship.
  16. image::overview.png[Class diagram showing overview of relationships]
  17. Major design decisions of this architecture are:
  18. * Keep each class as concise as possible, each focusing on one or
  19. small set of responsibilities. Smaller classes are generally easier
  20. to understand (at the cost of understanding how they work in the
  21. "big picture" of course) and easier to test.
  22. * On a related point, minimize dependency to any single class. A
  23. monolithic class on which many others are dependent is generally
  24. difficult to maintain because you'll need to ensure a change to the
  25. monolithic class doesn't break anything on any other classes.
  26. * Use polymorphism for any "fluid" behavior, and hide specific details
  27. under abstract interfaces so implementation details won't be
  28. directly referenced from any other part of the library.
  29. Specifically, the underlying memory segment type (local, mapped, and
  30. possibly others) and the source of in-memory data (master file or
  31. other data source) are hidden via a kind of polymorphism.
  32. * Separate classes directly used by applications from classes that
  33. implement details. Make the former classes as generic as possible,
  34. agnostic about implementation specific details such as the memory
  35. segment type (or, ideally and where possible, whether it's for
  36. in-memory cache or the underlying data source).
  37. The following give a summarized description of these classes.
  38. * `ConfigurableClientList`: The front end to application classes. An
  39. application that uses the data source library generally maintains
  40. one or more `ConfigurableClientList` object (usually one per RR
  41. class, or when we support views, probably one per view). This class
  42. is a container of sets of data source related classes, providing
  43. accessor to these classes and also acting as a factory of other
  44. related class objects. Note: Due to internal implementation
  45. reasons, there is a base class for `ConfigurableClientList` named
  46. `ClientList` in the C++ version, and applications are expected to
  47. use the latter. But conceptually `ConfigurableClientList` is an
  48. independent value class; the inheritance is not for polymorphism.
  49. Note also that the Python version doesn't have the base class.
  50. * `DataSourceInfo`: this is a straightforward tuple of set of class
  51. objects corresponding to a single data source, including
  52. `DataSourceClient`, `CacheConfig`, and `ZoneTableSegment`.
  53. `ConfigurableClientList` maintains a list of `DataSourceInfo`, one
  54. for each data source specified in its configuration.
  55. * `DataSourceClient`: The front end class to applications for a single
  56. data source. Applications will get a specific `DataSourceClient`
  57. object by `ConfigurableClientList::find()`.
  58. `DataSourceClient` itself is a set of factories for various
  59. operations on the data source such as lookup or update.
  60. * `CacheConfig`: library internal representation of in-memory cache
  61. configuration for a data source. It knows which zones are to be
  62. cached and where the zone data (RRs) should come from, either from a
  63. master file or other data source. With this knowledge it will
  64. create an appropriate `LoadAction` object. Note that `CacheConfig`
  65. isn't aware of the underlying memory segment type for the in-memory
  66. data. It's intentionally separated from this class (see the
  67. conciseness and minimal-dependency design decisions above).
  68. * `ZoneTableSegment`: when in-memory cache is enabled, it provides
  69. memory-segment-type independent interface to the in-memory data.
  70. This is an abstract base class (see polymorphism in the design
  71. decisions) and inherited by segment-type specific subclasses:
  72. `ZoneTableSegmentLocal` and `ZoneTableSegmentMapped` (and possibly
  73. others). Any subclass of `ZoneTableSegment` is expected to maintain
  74. the specific type of `MemorySegment` object.
  75. * `ZoneWriter`: a frontend utility class for applications to update
  76. in-memory zone data (currently it can only load a whole zone and
  77. replace any existing zone content with a new one, but this should be
  78. extended so it can handle partial updates).
  79. Applications will get a specific `ZoneWriter`
  80. object by `ConfigurableClientList::getCachedZoneWriter()`.
  81. `ZoneWriter` is constructed with `ZoneableSegment` and `LoadAction`.
  82. Since these are abstract classes, `ZoneWriter` doesn't have to be
  83. aware of "fluid" details. It's only responsible for "somehow" preparing
  84. `ZoneData` for a new version of a specified zone using `LoadAction`,
  85. and installing it in the `ZoneTable` (which can be accessed via
  86. `ZoneTableSegment`).
  87. * `DataSourceStatus`: created by `ConfigurableClientList::getStatus()`,
  88. a straightforward tuple that represents some status information of a
  89. specific data source managed in the `ConfigurableClientList`.
  90. `getStatus()` generates `DataSourceStatus` for all data sources
  91. managed in it, and returns them as a vector.
  92. * `ZoneTableAccessor`, `ZoneTableIterator`: frontend classes to get
  93. access to the conceptual "zone table" (a set of zones) stored in a
  94. specific data source. In particular, `ZoneTableIterator` allows
  95. applications to iterate over all zones (by name) stored in the
  96. specific data source.
  97. Applications will get a specific `ZoneTableAccessor`
  98. object by `ConfigurableClientList::getZoneTableAccessor()`,
  99. and get an iterator object by calling `getIterator` on the accessor.
  100. These are abstract classes and provide unified interfaces
  101. independent from whether it's for in-memory cached zones or "real"
  102. underlying data source. But the initial implementation only
  103. provides the in-memory cache version of subclass (see the next
  104. item).
  105. * `ZoneTableAccessorCache`, `ZoneTableIteratorCache`: implementation
  106. classes of `ZoneTableAccessor` and `ZoneTableIterator` for in-memory
  107. cache. They refer to `CacheConfig` to get a list of zones to be
  108. cached.
  109. * `ZoneTableHeader`, `ZoneTable`: top-level interface to actual
  110. in-memory data. These were separated based on a prior version of
  111. the design (http://bind10.isc.org/wiki/ScalableZoneLoadDesign) where
  112. `ZoneTableHeader` may contain multiple `ZoneTable`s. It's
  113. one-to-one relationship in the latest version (of implementation),
  114. so we could probably unify them as a cleanup.
  115. * `ZoneData`: representing the in-memory content of a single zone.
  116. `ZoneTable` contains (zero, one or) multiple `ZoneData` objects.
  117. * `RdataSet`: representing the in-memory content of (data of) a single
  118. RRset.
  119. `ZoneData` contains `RdataSet`s corresponding to the RRsets stored
  120. in the zone.
  121. * `LoadAction`: a "polymorphic" functor that implements loading zone
  122. data into memory. It hides from its user (i.e., `ZoneWriter`)
  123. details about the source of the data: master file or other data
  124. source (and perhaps some others). The "polymorphism" is actually
  125. realized as different implementations of the functor interface, not
  126. class inheritance (but conceptually the effect and goal is the
  127. same). Note: there's a proposal to replace `LoadAction` with
  128. a revised `ZoneDataLoader`, although the overall concept doesn't
  129. change. See Trac ticket #2912.
  130. * `ZoneDataLoader` and `ZoneDataUpdater`: helper classes for the
  131. `LoadAction` functor(s). These work independently from the source
  132. of data, taking a sequence of RRsets objects, converting them
  133. into the in-memory data structures (`RdataSet`), and installing them
  134. into a newly created `ZoneData` object.
  135. Sequence for auth module using local memory segment
  136. ---------------------------------------------------
  137. In the remaining sections, we explain how the classes shown in the
  138. previous section work together through their methods for commonly
  139. intended operations.
  140. The following sequence diagram shows the case for the authoritative
  141. DNS server module to maintain "local" in-memory data. Note that
  142. "auth" is a conceptual "class" (not actually implemented as a C++
  143. class) to represent the server application behavior. For the purpose
  144. of this document that should be sufficient. The same note applies to
  145. all examples below.
  146. image::auth-local.png[Sequence diagram for auth server using local memory segment]
  147. 1. On startup, the auth module creates a `ConfigurableClientList`
  148. for each RR class specified in the configuration for "data_sources"
  149. module. It then calls `ConfigurableClientList::configure()`
  150. for the given configuration of that RR class.
  151. 2. For each data source, `ConfigurableClientList` creates a
  152. `CacheConfig` object with the corresponding cache related
  153. configuration.
  154. 3. If in-memory cache is enabled for the data source,
  155. `ZoneTableSegment` is also created. In this scenario the cache
  156. type is specified as "local" in the configuration, so a functor
  157. creates `ZoneTableSegmentLocal` as the actual instance.
  158. In this case its `ZoneTable` is immediately created, too.
  159. 4. `ConfigurableClientList` checks if the created `ZoneTableSegment` is
  160. writable. It is always so for "local" type of segments. So
  161. `ConfigurableClientList` immediately loads zones to be cached into
  162. memory. For each such zone, it first gets the appropriate
  163. `LoadAction` through `CacheConfig`, then creates `ZoneWriter` with
  164. the `LoadAction`, and loads the data using the writer.
  165. 5. If the auth module receives a "reload" command for a cached zone
  166. from other module (xfrin, an end user, etc), it calls
  167. `ConfigurableClientList::getCachedZoneWriter` to load and install
  168. the new version of the zone. The same loading sequence takes place
  169. except that the user of the writer is the auth module.
  170. Also, the old version of the zone data is destroyed at the end of
  171. the process.
  172. Sequence for auth module using mapped memory segment
  173. ----------------------------------------------------
  174. This is an example for the authoritative server module that uses
  175. mapped type memory segment for in-memory data.
  176. image::auth-mapped.png[Sequence diagram for auth server using mapped memory segment]
  177. 1. The sequence is the same to the point of creating `CacheConfig`.
  178. 2. But in this case a `ZoneTableSegmentMapped` object is created based
  179. on the configuration of the cache type. This type of
  180. `ZoneTableSegment` is initially empty and isn't even associated
  181. with a `MemorySegment` (and therefore considered non-writable).
  182. 3. `ConfigurableClientList` checks if the zone table segment is
  183. writable to know whether to load zones into memory by itself,
  184. but as `ZoneTableSegment::isWritable()` returns false, it skips
  185. the loading.
  186. 4. The auth module gets the status of each data source, and notices
  187. there's a `WAITING` state of segment. So it subscribes to the
  188. "Memmgr" group on a command session and waits for an update
  189. from the memory manager (memmgr) module. (See also the note at the
  190. end of the section)
  191. 5. When the auth module receives an update command from memmgr, it
  192. calls `ConfigurableClientList::resetMemorySegment()` with the command
  193. argument and the segment mode of `READ_ONLY`.
  194. Note that the auth module handles the command argument as mostly
  195. opaque data; it's not expected to deal with details of segment
  196. type-specific behavior.
  197. 6. `ConfigurableClientList::resetMemorySegment()` subsequently calls
  198. `reset()` method on the corresponding `ZoneTableSegment` with the
  199. given parameters.
  200. In the case of `ZoneTableSegmentMapped`, it creates a new
  201. `MemorySegment` object for the mapped type, which internally maps
  202. the specific file into memory.
  203. memmgr is expected to have prepared all necessary data in the file,
  204. so all the data are immediately ready for use (i.e., there
  205. shouldn't be any explicit load operation).
  206. 7. When a change is made in the mapped data, memmgr will send another
  207. update command with parameters for new mapping. The auth module
  208. calls `ConfigurableClientList::resetMemorySegment()`, and the
  209. underlying memory segment is swapped with a new one. The old
  210. memory segment object is destroyed. Note that
  211. this "destroy" just means unmapping the memory region; the data
  212. stored in the file are intact.
  213. 8. If the auth module happens to receive a reload command from other
  214. module, it could call
  215. `ConfigurableClientList::getCachedZoneWriter()`
  216. to reload the data by itself, just like in the previous section.
  217. In this case, however, the writability check of
  218. `getCachedZoneWriter()` fails (the segment was created as
  219. `READ_ONLY` and is non-writable), so loading won't happen.
  220. NOTE: While less likely in practice, it's possible that the same auth
  221. module uses both "local" and "mapped" (and even others) type of
  222. segments for different data sources. In such cases the sequence is
  223. either the one in this or previous section depending on the specified
  224. segment type in the configuration. The auth module itself isn't aware
  225. of per segment-type details, but changes the behavior depending on the
  226. segment state of each data source at step 4 above: if it's `WAITING`,
  227. it means the auth module needs help from memmgr (that's all the auth
  228. module should know; it shouldn't be bothered with further details such
  229. as mapped file names); if it's something else, the auth module doesn't
  230. have to do anything further.
  231. Sequence for memmgr module initialization using mapped memory segment
  232. ---------------------------------------------------------------------
  233. This sequence shows the common initialization sequence for the
  234. memory manager (memmgr) module using a mapped type memory segment.
  235. This is a mixture of the sequences shown in Sections 2 and 3.
  236. image::memmgr-mapped-init.png[]
  237. 1. Initial sequence is the same until the application module (memmgr)
  238. calls `ConfigurableClientList::getStatus()` as that for the
  239. previous section.
  240. 2. The memmgr module identifies the data sources whose in-memory cache
  241. type is "mapped". (Unlike other application modules, the memmgr
  242. should know what such types means due to its exact responsibility).
  243. For each such data source, it calls
  244. `ConfigurableClientList::resetMemorySegment` with the READ_WRITE
  245. mode and other mapped-type specific parameters. memmgr should be
  246. able to generate the parameters from its own configuration and
  247. other data source specific information (such as the RR class and
  248. data source name).
  249. 3. The `ConfigurableClientList` class calls
  250. `ZoneTableSegment::reset()` on the corresponding zone table
  251. segment with the given parameters. In this case, since the mode is
  252. READ_WRITE, a new `ZoneTable` will be created (assuming this is a
  253. very first time initialization; if there's already a zone table
  254. in the segment, it will be used).
  255. 4. The memmgr module then calls
  256. `ConfigurableClientList::getZoneTableAccessor()`, and calls the
  257. `getItertor()` method on it to get a list of zones for which
  258. zone data are to be loaded into the memory segment.
  259. 5. The memmgr module loads the zone data for each such zone. This
  260. sequence is the same as shown in Section 2.
  261. 6. On loading all zone data, the memmgr module sends an update command
  262. to all interested modules (such as auth) in the segment, and waits
  263. for acknowledgment from all of them.
  264. 7. Then it calls `ConfigurableClientList::resetMemorySegment()` for
  265. this data source with almost the same parameter as step 2 above,
  266. but with a different mapped file name. This will make a swap of
  267. the underlying memory segment with a new mapping. The old
  268. `MemorySegment` object will be destroyed, but as explained in the
  269. previous section, it simply means unmapping the file.
  270. 8. The memmgr loads the zone data into the newly mapped memory region
  271. by repeating the sequence shown in step 5.
  272. 9. The memmgr repeats all this sequence for data sources that use
  273. "mapped" segment for in-memory cache. Note: it could handle
  274. multiple data sources in parallel, e.g., while waiting for
  275. acknowledgment from other modules.
  276. Sequence for memmgr module to reload a zone using mapped memory segment
  277. -----------------------------------------------------------------------
  278. This example is a continuation of the previous section, describing how
  279. the memory manager reloads a zone in mapped memory segment.
  280. image::memmgr-mapped-reload.png[]
  281. 1. When the memmgr module receives a reload command from other module,
  282. it calls `ConfigurableClientList::getCachedZoneWriter()` for the
  283. specified zone name. This method checks the writability of
  284. the segment, and since it's writable (as memmgr created it in the
  285. READ_WRITE mode), `getCachedZoneWriter()` succeeds and returns
  286. a `ZoneWriter`.
  287. 2. The memmgr module uses the writer to load the new version of zone
  288. data. There is nothing specific to mapped-type segment here.
  289. 3. The memmgr module then sends an update command to other modules
  290. that would share this version, and waits for acknowledgment from
  291. all of them.
  292. 4. On getting acknowledgments, the memmgr module calls
  293. `ConfigurableClientList::resetMemorySegment()` with the parameter
  294. specifying the other mapped file. This will swap the underlying
  295. `MemorySegment` with a newly created one, mapping the other file.
  296. 5. The memmgr updates this segment, too, so the two files will contain
  297. the same version of data.