Browse Source

[2738] Clarifications to the text

Many smaller clarifications and corrections. No change to the described
behaviour.
Michal 'vorner' Vaner 12 years ago
parent
commit
70a919f029
1 changed files with 49 additions and 33 deletions
  1. 49 33
      doc/design/ipc-high.txt

+ 49 - 33
doc/design/ipc-high.txt

@@ -21,6 +21,14 @@ disconnected during the attempt.
 Also, we expect the messages don't get damaged or modified on their
 Also, we expect the messages don't get damaged or modified on their
 way.
 way.
 
 
+On unrecoverable error (errors like EINTR or short read/write are
+recoverable, since there's clear way how to continue without losing
+any messages, errors like connection reset are unrecoverable), the
+client should abort completely. If it deems better to reconnect, it
+must assume anything might have happened during the time and start
+communication from scratch, discarding any knowledge gathered from the
+previous connection (configuration, addresses of other clients, etc).
+
 Addressing
 Addressing
 ----------
 ----------
 
 
@@ -32,6 +40,12 @@ We can specify the recipient in two different ways:
    When a message is sent to the group, all clients subscribed to the
    When a message is sent to the group, all clients subscribed to the
    group receive it. It is legal to send to an empty group.
    group receive it. It is legal to send to an empty group.
 
 
+[NOTE]
+If it is possible a group may contain multiple recipients, it is
+discouraged to send messages expecting an answer addressed to the
+group. It is not known how many answers are to come. See below for
+details on one-to-many communication.
+
 Feedback from the IPC system
 Feedback from the IPC system
 ----------------------------
 ----------------------------
 
 
@@ -42,17 +56,21 @@ Undeliverable notification::
   If the client requests it (by a per-message flag) and the set of
   If the client requests it (by a per-message flag) and the set of
   recipients specified is empty (either because the connection
   recipients specified is empty (either because the connection
   ID/lname is not connected or because the addressed group is empty),
   ID/lname is not connected or because the addressed group is empty),
-  an answer message is sent from the MSGQ daemon to notify it about
+  an answer message is sent from the daemon to notify it about
-  the situation.
+  the situation. However, since the recipient still can take a long
+  time to answer (if it exists), clients that need high availability
+  should not wait for the answer in blocking way.
 Notifications about connections and disconnections::
 Notifications about connections and disconnections::
   The system generates notification about following events:
   The system generates notification about following events:
-  * Client with given lname connected
+  * Client connected (sent with the lname of the client)
-  * Client with given lname disconnected
+  * Client disconnected (sent with the lname of the client)
-  * Client with given lname subscribed to given group
+  * Client subscribed (sent with the name of group and lname of
-  * Client with given lname unsubscribed from given group
+    client)
+  * Client unsubscribed (sent with the name of group and lname of
+    client)
 List of group members:
 List of group members:
-  The MSGQ provides a command to list members of given group and list
+  The daemon provides a command to list lnames of clients subscribed
-  of all connections.
+  to given group, and lnames of all connections.
 
 
 Communication paradigms
 Communication paradigms
 -----------------------
 -----------------------
@@ -72,24 +90,22 @@ When the event happens, it is sent (broadcasted) to the group, without
 requiring an answer.
 requiring an answer.
 
 
 [[NOTE]]
 [[NOTE]]
-To avoid race conditions on start up, it is important to first
+A care should be taken to avoid race conditions. Imagine one module
-subscribe to the group and then load the initial state (for which
+provides some kind of state (let's say it's the configuration manager
-change the notification would be), not the other way around. The other
+and the configuration is the shared state). The other modules are
-way, the state could be loaded, and then, before subscribing, an event
+using notifications to update their copy when the configuration
-could happen and be unnoticed. On the other hand, we may still receive
+changes (eg. when the configuration changes, the configuration manager
-event for change to the state we already loaded (which should be
+sends a notification with description of the change).
-generally safe), but not lose an update.
+
-
+The correct order is to first subscribe to the notifications and then
-Examples of these kinds could be change to a zone data (so it would
+request the whole configuration. If it was done the other way around,
-get reloaded by every process that has the data),
+there would be a short time between the request and the subscription
-connection/disconnection notification from msgq, or change to
+when an update to the state could happen without the module noticing.
-configuration data.
+
-
+With first subscribing, the notification could come before the initial
-It would be the recipients responsibility to handle the notification,
+version is known or arrive even when the initial version already
-or at least, produce an error log message. In these situations, the
+includes the change, but these are possible to handle, while the
-originator can not reasonably handle error cases anyway (the zone data
+missing update is not.
-has been written), so if something does not work, log is everything we
-can do.
 
 
 One-to-one RPC call
 One-to-one RPC call
 ~~~~~~~~~~~~~~~~~~~
 ~~~~~~~~~~~~~~~~~~~
@@ -108,14 +124,14 @@ A command message (containing the parameters, name of the command,
 etc) is sent, with the want-answer flag set. The other side processes
 etc) is sent, with the want-answer flag set. The other side processes
 the command and sends a result or error back.
 the command and sends a result or error back.
 
 
-If the recipient does not exist, the msgq sends an error right away.
+If the recipient does not exist, the daemon sends an error right away.
 
 
 There are still two ways this may fail to provide an answer:
 There are still two ways this may fail to provide an answer:
 
 
  * The receiving module reads the command, but does not provide an
  * The receiving module reads the command, but does not provide an
    answer. Clearly, such module is broken. There should be some (long)
    answer. Clearly, such module is broken. There should be some (long)
    timeout for this situation, and loud logging to get it fixed.
    timeout for this situation, and loud logging to get it fixed.
- * The receiving module terminated at the exact time when msgq tried
+ * The receiving module terminated at the exact time when daemon tried
    to send to it, or crashed handling the command. Therefore the
    to send to it, or crashed handling the command. Therefore the
    sender listens for disconnect or unsubscription notifications
    sender listens for disconnect or unsubscription notifications
    (depending on if it was sent by lname or group name) and if the
    (depending on if it was sent by lname or group name) and if the
@@ -130,15 +146,15 @@ One-to-many RPC call
 Sometimes it is needed to send a command to bunch of modules at once,
 Sometimes it is needed to send a command to bunch of modules at once,
 usually all members of a group that can contain any number of clients.
 usually all members of a group that can contain any number of clients.
 
 
-This would be done by requesting the members of the group from msgq
+This would be done by requesting the members of the group from the
-and then sending a one-to-one RPC call to each of them, tracking them
+daemon and then sending a one-to-one RPC call to each of them,
-separately.
+tracking them separately.
 
 
 [NOTE]
 [NOTE]
 It might happen the list of group members changes between the time it
 It might happen the list of group members changes between the time it
 was requested and the time the commands are sent. If a client gets
 was requested and the time the commands are sent. If a client gets
-disconnected, the sender gets an undeliverable error back from msgq.
+disconnected, the sender gets an undeliverable error back from the
-If anything else happens (the client unsubscribes, connects,
+daemon.  If anything else happens (the client unsubscribes, connects,
 subscribes), it must explicitly synchronise to the state anyway,
 subscribes), it must explicitly synchronise to the state anyway,
 because we could have sent the commands before the change actually
 because we could have sent the commands before the change actually
 happened and it would look the same to the client.
 happened and it would look the same to the client.