Browse Source

Merge #2738

Clarify high-level design of IPC
Michal 'vorner' Vaner 12 years ago
parent
commit
27b1c2767a
1 changed files with 382 additions and 0 deletions
  1. 382 0
      doc/design/ipc-high.txt

+ 382 - 0
doc/design/ipc-high.txt

@@ -0,0 +1,382 @@
+The IPC protocol
+================
+
+While the cc-protocol.txt describes the low-level primitives, here we
+describe how the whole IPC should work and how to use it.
+
+Definitions
+-----------
+
+system::
+  The system that moves data between the users and does bookkeeping.
+  In our current implementation, it is implemented as the MsgQ daemon,
+  which the users connect to and it routes the data.
+user::
+  Usually a process; generally an entity that wants to communicate
+  with the other users.
+session::
+  Session is the interface by which the user communicates with the
+  system. Single user may have multiple sessions, a session belongs to
+  single user.
+message::
+  A data blob sent by one user. The recipient might be the system
+  itself, other session or set of sessions (called group, see below,
+  it is possibly empty). Message is either a response or an original
+  message (TODO: Better name?).
+group::
+  A named set of sessions. Conceptually, all the possible groups
+  exist, there's no explicit creation and deletion of groups.
+session id::
+  Unique identifier of a session. It is not reused for the whole
+  lifetime of the system. Historically called `lname` in the code.
+undelivery signal::
+  While sending an original message, a client may request an
+  undelivery signal. If the recipient specification yields no
+  sessions to deliver the message to, the system informs user about
+  the situation.
+sequence number::
+  Each message sent through the system carries a sequence number. The
+  number should be unique per sender. It can be used to pair a
+  response to the original message, since the response specifies which
+  sequence number had the message it response to. Even responses and
+  messages not expecting answer have their sequence number, but it is
+  generally unused.
+non-blocking operation::
+  Operation that will complete without waiting for anything.
+fast operation::
+  Operation that may wait for other process, but only for a very short
+  time. Generally, this includes communication between the user and
+  system, but not between two clients. It can be expected to be fast
+  enough to use this inside an interactive session, but may be too
+  heavy in the middle of query processing, for example. Every
+  non-blocking operation is considered fast.
+
+The session
+-----------
+
+The session interface allows for several operations interacting with
+the system. In the code, it is represented by a class.
+
+Possible operations include:
+
+Opening a session::
+  The session is created and connects to the system. This operation is
+  fast. The session receives session id from the system.
+
+Group management::
+  A user may subscribe (become member) of a group, or unsubscribe from
+  a group. These are fast operations.
+
+Send::
+  A user may send a message, addressed to the system, or other
+  session(s). This operation is expected to be non-blocking
+  (current implementation is based on assumption of how OS handles the
+  sends, which may need to be revisited if it turns out to be false).
+
+Receive synchronously::
+  User may wait for an incoming message in blocking mode. It is
+  possible to specify the kind of message to wait for, either original
+  message or response to a message. This interface has a timeout.
+
+Receive asynchronously::
+  Similar to previous, but non-blocking. It terminates immediately.
+  The user provides a callback that is invoked when the requested
+  message arrives.
+
+Terminate::
+  A session may be terminated. No more messages are sent or received
+  over it, the session is automatically unsubscribed from all the
+  groups. This operation is non-blocking. A session is terminated
+  automatically if the user exits.
+
+Assumptions
+-----------
+
+We assume reliability and order of delivery. Messages sent from user A
+to B are all delivered unchanged in original order as long as B
+exists.
+
+All above operations are expected to always succeed. If there's an
+error reported, it should be considered fatal and user should
+exit. In case a user still wants to continue, the session must be
+considered terminated and a new one must be created. Care must be
+taken not to use any information obtained from the previous session,
+since the state in other users and the system may have changed during
+the reconnect.
+
+Addressing
+----------
+
+Addressing happens in three ways:
+
+By group name::
+  The message is routed to all the sessions subscribed to this group.
+  It is legal to address an empty group; such message is then
+  delivered to no sessions.
+By session ID::
+  The message is sent to the single session, if it is still alive.
+By an alias::
+  A session may have any number of aliases - well known names. Only
+  single session may hold given alias (but it is not yet enforced by
+  the system). The message is delivered to the one session owning the
+  alias, if any. Internally, the aliases are implemented as groups
+  with single subscribed session, so it is the same as the first
+  option on the protocol level, but semantically it is different.
+
+The system
+----------
+
+The system performs these goals:
+
+ * Maintains the open sessions and allows creating new ones.
+ * Keeps information about groups and which sessions are subscribed to
+   which group.
+ * Routes the messages between users.
+
+Also, the system itself is a user of the system. It can be reached by
+the alias `Msgq` and provides following high-level services (see
+below):
+
+Notifications about sessions::
+  When a session is opened to the system or when a session is
+  terminated, a notification is sent to interested users. The
+  notification contains the session ID of the session in question.
+  The termination notification is probably more useful (if a user
+  communicated with a given session before, it might be interested it
+  is no longer available), the opening notification is provided mostly
+  for completeness.
+Notifications about group subscriptions::
+  When a session subscribes to a group or unsubscribes from a group, a
+  notification is sent to interested users. The notification contains
+  both the session ID of the session subscribing/unsubscribing and
+  name of the group. This includes notifications about aliases (since
+  aliases are groups internally).
+Commands to list sessions::
+  There's a command to list session IDs of all currently opened sessions
+  and a command to list session IDs of all sessions subscribed to a
+  given group. Note that using these lists might need some care, as
+  the information might be outdated at the time it is delivered to the
+  user.
+
+User shows interest in notifications about sessions and group
+subscriptions by subscribing to a group with well-known name (as with
+any notification).
+
+Note that due to implementation details, the `Msgq` alias is not yet
+available during early stage of the bootstrap of bind10 system. This
+means some very core services can't rely on the above services of the
+system. The alias is guaranteed to be working before the first
+non-core module is started.
+
+Higher-level services
+---------------------
+
+While the system is able to send any kind of data, the payload sent by
+users in bind10 is structured data encoded as JSON. The messages sent
+are of three general types:
+
+Command::
+  A message sent to single destination, with the undeliverable
+  signal turned on and expecting an answer. This is a request
+  to perform some operation on the recipient (it can have side effects
+  or not). The command is identified by a name and it can have
+  parameters. A command with the same name may behave differently (or
+  have different parameters) on different receiving users.
+Reply::
+  An answer to the `Command`. It is sent directly to the session where
+  the command originated from, does not expect further answer and the
+  undeliverable notification is not set. It either confirms the
+  command was run successfully and contains an optional result, or
+  notifies the sender of failure to run the command. Success and
+  failure differ only in the payload sent through the system, not in
+  the way it is sent. The undeliverable signal is failure
+  reply sent by the system on behalf of the missing recipient.
+Notification::
+  A message sent to any number of destinations (eg. sent to a group),
+  not expecting an answer. It notifies other users about an event or
+  change of state.
+
+Details of the higher-level
+---------------------------
+
+While there are libraries implementing the communication in convenient
+way, it is useful to know what happens inside.
+
+The notifications are probably the simplest. Users interested in
+receiving notifications of some family subscribe to corresponding
+group. Then, a client sends a message to the group. For example, if
+clients `receiver-A` and `receiver-B` want to receive notifications
+about changes to zone data, they'd subscribe to the
+`Notifications/ZoneUpdates` group. Then, other client (let's say
+`XfrIn`, with session ID `s12345`) would send something like:
+
+  s12345 -> Notifications/ZoneUpdates
+  {"notification": ["zone-update", {
+      "class": "IN",
+      "origin": "example.org.",
+      "serial": 123456
+  }]}
+
+Both receivers would receive the message and know that the
+`example.org` zone is now at version 123456. Note that multiple users
+may produce the same kind of notification. Also, single group may be
+used to send multiple notification names (but they should be related;
+in our example, the `Notifications/ZoneUpdates` could be used for
+`zone-update`, `zone-available` and `zone-unavailable` notifications
+for change in zone data, configuration of new zone in the system and
+removal of a zone from configuration).
+
+Sending a command to single recipient is slightly more complex. The
+sending user sends a message to the receiving one, addressed either by
+session ID or by an alias (group to which at most one session may be
+subscribed). The message contains the name of the command and
+parameters. It is sent with the undeliverable signals turned on.
+The user also starts a timer (with reasonably long timeout). The
+sender also subscribes to notifications about terminated sessions or
+unsubscription from the alias group.
+
+The receiving user gets the message, runs the command and sends a
+response back, with the result. The response has the undeliverable
+signal turned off and it is marked as response to the message
+containing the command. The sending user receives the answer and pairs
+it with the command.
+
+There are several things that may go wrong.
+
+* There might be an error on the receiving user (bad parameters, the
+  operation failed, the recipient doesn't know command of that name).
+  The receiving side sends the response as previous, the only
+  difference is the content of the payload. The sending user is
+  notified about it, without delays.
+* The recipient user doesn't exist (either the session ID is wrong or
+  terminated already, or the alias is empty). The system sends a
+  failure response and the sending user knows immediately the command
+  failed.
+* The recipient disconnects while processing the command (possibly
+  crashes). The sender gets a notification about disconnection or
+  unsubscription from the alias group and knows the answer won't come.
+* The recipient ``blackholes'' the command. It receives it, but never
+  answers. The timeout in sender times out. As this is a serious
+  programmer error in the recipient and should be rare, the sender
+  should at least log an error to notify about the case.
+
+One example would be asking the question of life, universe and
+everything (all the examples assume the sending user is already
+subscribed to the notifications):
+
+  s12345 -> DeepThought
+  {"command": ["question", {
+      "what": ["Life", "Universe", "*"]
+  }]}
+  s23456 -> s12345
+  {"reply": [0, 42]}
+
+The deep thought had an alias. But the answer is sent from its session
+ID. The `0` in the reply means ``success''.
+
+Another example might be asking for some data at a bureau and getting
+an error:
+
+  s12345 -> Burreau
+  {"command": ["provide-information", {
+      "about": "me",
+      "topic": "taxes"
+  }]}
+  s23456 -> s12345
+  {"reply": [1, "You need to fill in other form"]}
+
+And, in this example, the sender is trying to reach an non-existent
+session. The `msgq` here is not the alias `Msgq`, but a special
+``phantom'' session ID that is not listed anywhere.
+
+  s12345 -> s0
+  {"command": ["ping"]}
+  msgq -> s12345
+  {"reply": [-1, "No such recipient"]}
+
+Last, an example when the other user disconnects while processing the
+command.
+
+  s12345 -> s23456
+  {"command": ["shutdown"]}
+  msgq -> s12345
+  {"notification": ["disconnected", {
+    "lname": "s23456"
+  }]}
+
+The system does not support sending a command to multiple users
+directly. It can be accomplished as this:
+
+* The sending user calls a command on the system to get list of
+  sessions in given group. This is command to alias, so it can be done
+  by the previous way.
+* After receiving the list of session IDs, multiple copies of the
+  command are sent by the sending user, one to each of the session
+  IDs.
+* Successes and failures are handled the same as above, since these
+  are just single-recipient commands.
+
+So, this would be an example with unhelpful war council.
+
+  s12345 -> Msgq
+  {"command": ["get-subscriptions", {
+      "group": "WarCouncil"
+  }]}
+  msgq -> s12345
+  {"reply": [0, ["s1", "s2", "s3"]]}
+  s12345 -> s1
+  {"command": ["advice", {
+      "topic": "Should we attack?"
+  }]}
+  s12345 -> s2
+  {"command": ["advice", {
+      "topic": "Should we attack?"
+  }]}
+  s12345 -> s3
+  {"command": ["advice", {
+      "topic": "Should we attack?"
+  }]}
+  s1 -> s12345
+  {"reply": [0, true]}
+  s2 -> s12345
+  {"reply": [0, false]}
+  s3 -> s12345
+  {"reply": [1, "Advice feature not implemented"]}
+
+Users
+-----
+
+While there's a lot of flexibility for the behaviour of a user, it
+usually comes to something like this (during the lifetime of the
+user):
+
+* The user starts up.
+* Then it creates one or more sessions (there may be technical reasons
+  to have more than one session, such as threads, but it is not
+  required by the system).
+* It subscribes to some groups to receive notifications in future.
+* It binds to some aliases if it wants to be reachable by others by a
+  nice name.
+* It invokes some start-up commands (to get the configuration, for
+  example).
+* During the lifetime, it listens for notifications and answers
+  commands. It also invokes remote commands and sends notifications
+  about things that are happening.
+* Eventually, the user terminates, closing all the sessions it had
+  opened.
+
+Known limitations
+-----------------
+
+It is meant mostly as signalling protocol. Sending millions of
+messages or messages of several tens of megabytes is probably a bad
+idea. While there's no architectural limitation with regards of the
+number of transferred messages and the maximum size of message is 4GB,
+the code is not optimised and it would probably be very slow.
+
+We currently expect the system not to be at heavy load. Therefore, we
+expect the system to keep up with users sending messages. The
+libraries write in blocking mode, which is no problem if the
+expectation is true, as the write buffers will generally be empty and
+the write wouldn't block, but if it turns out it is not the case, we
+might need to reconsider.