|
@@ -4,169 +4,321 @@ The IPC protocol
|
|
|
While the cc-protocol.txt describes the low-level primitives, here we
|
|
|
describe how the whole IPC should work and how to use it.
|
|
|
|
|
|
-Assumptions
|
|
|
+Definitions
|
|
|
+-----------
|
|
|
+
|
|
|
+system::
|
|
|
+ The system that moves data between the users and does bookkeeping.
|
|
|
+ In our current implementation, it is implemented as the MsgQ daemon,
|
|
|
+ which the users connect to and it routes the data.
|
|
|
+user::
|
|
|
+ Usually a process; generally an entity that wants to communicate
|
|
|
+ with the other users.
|
|
|
+session::
|
|
|
+ Session is the interface by which the user communicates with the
|
|
|
+ system. Single user may have multiple sessions, a session belongs to
|
|
|
+ single user.
|
|
|
+message::
|
|
|
+ A data blob sent by one user. The recipient might be the system
|
|
|
+ itself, other user or set of users (possibly empty). Message is
|
|
|
+ either a response or an original message (TODO: Better name?).
|
|
|
+group::
|
|
|
+ A named set of sessions. Conceptually, all the possible groups
|
|
|
+ exist, there's no explicit creation and deletion of groups.
|
|
|
+session id::
|
|
|
+ Unique identifier of a session. It is not reused for the whole
|
|
|
+ lifetime of the system. Historically called `lname` in the code.
|
|
|
+undelivery notification::
|
|
|
+ While sending an original message, a client may request an
|
|
|
+ undelivery notification. If the recipient specification yields no
|
|
|
+ sessions to deliver the message to, the system informs user about
|
|
|
+ the situation.
|
|
|
+sequence number::
|
|
|
+ Each message sent through the system carries a sequence number. The
|
|
|
+ number should be unique per sender. It can be used to pair a
|
|
|
+ response to the original message, since the response specifies which
|
|
|
+ sequence number had the message it response to. Even responses and
|
|
|
+ messages not expecting answer have their sequence number, but it is
|
|
|
+ generally unused.
|
|
|
+
|
|
|
+The session
|
|
|
-----------
|
|
|
|
|
|
-We assume the low-level protocol keeps ordering of messages. That is,
|
|
|
-if A sends messages 1 and 2 to B, they get delivered in the same order
|
|
|
-as they were sent. However, if A sends message 1 to B and 2 to C, the
|
|
|
-order in which get them or the order in which they answer is not
|
|
|
-defined.
|
|
|
+The session interface allows for several operations interacting with
|
|
|
+the system. In the code, it is represented by a class.
|
|
|
+
|
|
|
+Possible operations include:
|
|
|
+
|
|
|
+Opening a session::
|
|
|
+ The session is created and connects to the system. This operation is
|
|
|
+ fast, but it can still block for short amount of time. The session
|
|
|
+ receives session id from the system.
|
|
|
+
|
|
|
+Group management::
|
|
|
+ A user may subscribe (become member) of a group, or unsubscribe from
|
|
|
+ a group.
|
|
|
+
|
|
|
+Send::
|
|
|
+ A user may send a message, addressed to the system, or other
|
|
|
+ client(s). This operation is generally expected to be non-blocking
|
|
|
+ (but it may be based on the assumption of OS buffering and the
|
|
|
+ system not being overloaded).
|
|
|
|
|
|
-We also assume that the delivery is reliable. If B gets a message from
|
|
|
-A, it can be sure that all previous messages were delivered too. If A
|
|
|
-sends a message to B, B either gets the message or either A or B is
|
|
|
-disconnected during the attempt.
|
|
|
+Receive synchronously::
|
|
|
+ User may wait for an incoming message in blocking mode. It is
|
|
|
+ possible to specify the kind of message to wait for, either original
|
|
|
+ message or response to a message. This interface has a timeout.
|
|
|
|
|
|
-Also, we expect the messages don't get damaged or modified on their
|
|
|
-way.
|
|
|
+Receive asynchronously::
|
|
|
+ Similar to previous, but non-blocking. It terminates immediately.
|
|
|
+ The user provides a callback that is invoked when the requested
|
|
|
+ message arrives.
|
|
|
|
|
|
-On unrecoverable error (errors like EINTR or short read/write are
|
|
|
-recoverable, since there's clear way how to continue without losing
|
|
|
-any messages, errors like connection reset are unrecoverable), the
|
|
|
-client should abort completely. If it deems better to reconnect, it
|
|
|
-must assume anything might have happened during the time and start
|
|
|
-communication from scratch, discarding any knowledge gathered from the
|
|
|
-previous connection (configuration, addresses of other clients, etc).
|
|
|
+Terminate::
|
|
|
+ A session may be terminated. No more messages are sent or received
|
|
|
+ over it, the session is automatically unsubscribed from all the
|
|
|
+ groups. A session is terminated automatically if the user exits.
|
|
|
+
|
|
|
+Assumptions
|
|
|
+-----------
|
|
|
+
|
|
|
+We assume reliability and order of delivery. Messages sent from user A
|
|
|
+to B are all delivered unchanged in original order as long as B
|
|
|
+exists.
|
|
|
+
|
|
|
+All above operations are expected to always succeed. If there's an
|
|
|
+error reported, it should be considered fatal and user should
|
|
|
+exit. In case a user still wants to continue, the session must be
|
|
|
+considered terminated and a new one must be created. Care must be
|
|
|
+taken not to use any information obtained from the previous session,
|
|
|
+since the state in other users and the system may have changed during
|
|
|
+the reconnect.
|
|
|
|
|
|
Addressing
|
|
|
----------
|
|
|
|
|
|
-We can specify the recipient in two different ways:
|
|
|
-
|
|
|
- * Directly. Each connected client has an unique address. A message
|
|
|
- addressed to that address is sent only to the one client.
|
|
|
- * By a group. A client might subscribe to any number of groups.
|
|
|
- When a message is sent to the group, all clients subscribed to the
|
|
|
- group receive it. It is legal to send to an empty group.
|
|
|
-
|
|
|
-[NOTE]
|
|
|
-If it is possible a group may contain multiple recipients, it is
|
|
|
-discouraged to send messages expecting an answer addressed to the
|
|
|
-group. It is not known how many answers are to come. See below for
|
|
|
-details on one-to-many communication.
|
|
|
-
|
|
|
-Feedback from the IPC system
|
|
|
-----------------------------
|
|
|
-
|
|
|
-The IPC system generates some additional information to aid the
|
|
|
-communicating clients.
|
|
|
-
|
|
|
-Undeliverable notification::
|
|
|
- If the client requests it (by a per-message flag) and the set of
|
|
|
- recipients specified is empty (either because the connection
|
|
|
- ID/lname is not connected or because the addressed group is empty),
|
|
|
- an answer message is sent from the daemon to notify it about
|
|
|
- the situation. However, since the recipient still can take a long
|
|
|
- time to answer (if it exists), clients that need high availability
|
|
|
- should not wait for the answer in blocking way.
|
|
|
-Notifications about connections and disconnections::
|
|
|
- The system generates notification about following events:
|
|
|
- * Client connected (sent with the lname of the client)
|
|
|
- * Client disconnected (sent with the lname of the client)
|
|
|
- * Client subscribed (sent with the name of group and lname of
|
|
|
- client)
|
|
|
- * Client unsubscribed (sent with the name of group and lname of
|
|
|
- client)
|
|
|
-List of group members:
|
|
|
- The daemon provides a command to list lnames of clients subscribed
|
|
|
- to given group, and lnames of all connections.
|
|
|
-
|
|
|
-Communication paradigms
|
|
|
------------------------
|
|
|
-
|
|
|
-Event notifications
|
|
|
-~~~~~~~~~~~~~~~~~~~
|
|
|
-
|
|
|
-Sometimes, an event that may be interesting to other parts of the
|
|
|
-system happens. The originating module may not know what other modules
|
|
|
-are interested in that kind of event, nor it may know if any at all
|
|
|
-wants to know that. With such event, the originating module does not
|
|
|
-need any feedback.
|
|
|
-
|
|
|
-For each kind or family of notifications, there's a group. Everybody
|
|
|
-interested in that family of notifications subscribes to the group.
|
|
|
-When the event happens, it is sent (broadcasted) to the group, without
|
|
|
-requiring an answer.
|
|
|
-
|
|
|
-[[NOTE]]
|
|
|
-A care should be taken to avoid race conditions. Imagine one module
|
|
|
-provides some kind of state (let's say it's the configuration manager
|
|
|
-and the configuration is the shared state). The other modules are
|
|
|
-using notifications to update their copy when the configuration
|
|
|
-changes (eg. when the configuration changes, the configuration manager
|
|
|
-sends a notification with description of the change).
|
|
|
-
|
|
|
-The correct order is to first subscribe to the notifications and then
|
|
|
-request the whole configuration. If it was done the other way around,
|
|
|
-there would be a short time between the request and the subscription
|
|
|
-when an update to the state could happen without the module noticing.
|
|
|
-
|
|
|
-With first subscribing, the notification could come before the initial
|
|
|
-version is known or arrive even when the initial version already
|
|
|
-includes the change, but these are possible to handle, while the
|
|
|
-missing update is not.
|
|
|
-
|
|
|
-One-to-one RPC call
|
|
|
-~~~~~~~~~~~~~~~~~~~
|
|
|
-
|
|
|
-Sometimes, a process needs to call remote function (or command) in
|
|
|
-other process. An example could be asking the configuration manager
|
|
|
-for the current configuration or asking it to change it, asking single
|
|
|
-process to terminate, etc.
|
|
|
-
|
|
|
-It may be that the group is a singleton group (eg. the command
|
|
|
-manager, there must be exactly one in a running system, and is used
|
|
|
-just as a stable name for the process) or an lname received by means
|
|
|
-of other communication (like a previous subscribe notification).
|
|
|
-
|
|
|
-A command message (containing the parameters, name of the command,
|
|
|
-etc) is sent, with the want-answer flag set. The other side processes
|
|
|
-the command and sends a result or error back.
|
|
|
-
|
|
|
-If the recipient does not exist, the daemon sends an error right away.
|
|
|
-
|
|
|
-There are still two ways this may fail to provide an answer:
|
|
|
-
|
|
|
- * The receiving module reads the command, but does not provide an
|
|
|
- answer. Clearly, such module is broken. There should be some (long)
|
|
|
- timeout for this situation, and loud logging to get it fixed.
|
|
|
- * The receiving module terminated at the exact time when daemon tried
|
|
|
- to send to it, or crashed handling the command. Therefore the
|
|
|
- sender listens for disconnect or unsubscription notifications
|
|
|
- (depending on if it was sent by lname or group name) and if the
|
|
|
- recipient disconnects, the sender knows it should not expect the
|
|
|
- answer any more.
|
|
|
-
|
|
|
-An asynchronous waiting for the answer is preferred.
|
|
|
-
|
|
|
-One-to-many RPC call
|
|
|
-~~~~~~~~~~~~~~~~~~~~
|
|
|
-
|
|
|
-Sometimes it is needed to send a command to bunch of modules at once,
|
|
|
-usually all members of a group that can contain any number of clients.
|
|
|
-
|
|
|
-This would be done by requesting the members of the group from the
|
|
|
-daemon and then sending a one-to-one RPC call to each of them,
|
|
|
-tracking them separately.
|
|
|
-
|
|
|
-[NOTE]
|
|
|
-It might happen the list of group members changes between the time it
|
|
|
-was requested and the time the commands are sent. If a client gets
|
|
|
-disconnected, the sender gets an undeliverable error back from the
|
|
|
-daemon. If anything else happens (the client unsubscribes, connects,
|
|
|
-subscribes), it must explicitly synchronise to the state anyway,
|
|
|
-because we could have sent the commands before the change actually
|
|
|
-happened and it would look the same to the client.
|
|
|
-
|
|
|
-[WARNING]
|
|
|
-It would look better to first request the list of group members and
|
|
|
-then send the command to the group, and use the list to track the
|
|
|
-answers only. But that is prone to race conditions ‒ if there's any
|
|
|
-change between the request for the member list and sending the
|
|
|
-command, the actual recipients don't match the list and the server
|
|
|
-could get more answers than expected or could wait for answer of a
|
|
|
-module that no longer exists.
|
|
|
+Addressing happens in three ways:
|
|
|
+
|
|
|
+By group name::
|
|
|
+ The message is routed to all the sessions subscribed to this group.
|
|
|
+ It is legal to address an empty group; such message is then
|
|
|
+ delivered to no sessions.
|
|
|
+By session ID::
|
|
|
+ The message is sent to the single session, if it is still alive.
|
|
|
+By an alias::
|
|
|
+ A session may have any number of aliases - well known names. Only
|
|
|
+ single session may hold given alias (but it is not yet enforced by
|
|
|
+ the system). The message is delivered to the one session owning the
|
|
|
+ alias, if any. Internally, the aliases are implemented as groups
|
|
|
+ with single subscribed session, so it is the same as the first
|
|
|
+ option on the protocol level, but semantically it is different.
|
|
|
+
|
|
|
+The system
|
|
|
+----------
|
|
|
+
|
|
|
+The system performs these goals:
|
|
|
+
|
|
|
+ * Maintains the open sessions and allows creating new ones.
|
|
|
+ * Keeps information about groups and which sessions are subscribed to
|
|
|
+ which group.
|
|
|
+ * Routes the messages between users.
|
|
|
+
|
|
|
+Also, the system itself is a user of the system. It can be reached by
|
|
|
+the alias `Msgq` and provides following high-level services (see
|
|
|
+below):
|
|
|
+
|
|
|
+Notifications about sessions::
|
|
|
+ When a session is opened to the system or when a session is
|
|
|
+ terminated, a notification is sent to interested users. The
|
|
|
+ notification contains the session ID of the session in question.
|
|
|
+ The termination notification is probably more useful (if a user
|
|
|
+ communicated with a given session before, it might be interested it
|
|
|
+ is no longer available), the opening notification is provided mostly
|
|
|
+ for completeness.
|
|
|
+Notifications about group subscriptions::
|
|
|
+ When a session subscribes to a group or unsubscribes from a group, a
|
|
|
+ notification is sent to interested users. The notification contains
|
|
|
+ both the session ID of the session subscribing/unsubscribing and
|
|
|
+ name of the group.
|
|
|
+Commands to list sessions::
|
|
|
+ There's a command to list session IDs of all currently opened sessions
|
|
|
+ and a command to list session IDs of all sessions subscribed to a
|
|
|
+ given group. Note that using these lists might need some care, as
|
|
|
+ the information might be outdated at the time it is delivered to the
|
|
|
+ user.
|
|
|
+
|
|
|
+Note that in early stages of startup (before the configuration
|
|
|
+manager's session is opened), the `Msgq` alias is not yet available.
|
|
|
+
|
|
|
+Higher-level services
|
|
|
+---------------------
|
|
|
+
|
|
|
+While the system is able to send any kind of data, the payload sent by
|
|
|
+users in bind10 is structured data encoded as JSON. The messages sent
|
|
|
+are of three general types:
|
|
|
+
|
|
|
+Command::
|
|
|
+ A message sent to single destination, with the undeliverable
|
|
|
+ notifications turned on and expecting an answer. This is a request
|
|
|
+ to perform some operation on the recipient (it can have side effects
|
|
|
+ or not). The command is identified by a name and it can have
|
|
|
+ parameters. A command with the same name may behave differently (or
|
|
|
+ have different parameters) on different receiving users.
|
|
|
+Reply::
|
|
|
+ An answer to the `Command`. It is sent directly to the session where
|
|
|
+ the command originated from, does not expect further answer and the
|
|
|
+ undeliverable notification is not set. It either confirms the
|
|
|
+ command was run successfully and contains an optional result, or
|
|
|
+ notifies the sender of failure to run the command. Success and
|
|
|
+ failure differ only in the payload sent through the system, not in
|
|
|
+ the way it is sent. The undeliverable notification is failure
|
|
|
+ reply sent by the system on behalf of the missing recipient.
|
|
|
+Notification::
|
|
|
+ A message sent to any number of destinations (eg. sent to a group),
|
|
|
+ not expecting an answer. It notifies other users about an event or
|
|
|
+ change of state.
|
|
|
+
|
|
|
+Details of the higher-level
|
|
|
+---------------------------
|
|
|
+
|
|
|
+The notifications are probably the simplest. Users interested in
|
|
|
+receiving notifications of some family subscribe to corresponding
|
|
|
+group. Then, a client sends a message to the group. For example, if
|
|
|
+clients `receiver-A` and `receiver-B` want to receive notifications
|
|
|
+about changes to zone data, they'd subscribe to the
|
|
|
+`Notifications/ZoneUpdates` group. Then, other client (let's say
|
|
|
+`XfrIn`, with session ID `s12345`) would send something like:
|
|
|
+
|
|
|
+ s12345 -> Notifications/ZoneUpdates
|
|
|
+ {"notification": ["zone-update", {
|
|
|
+ "class": "IN",
|
|
|
+ "origin": "example.org.",
|
|
|
+ "serial": 123456
|
|
|
+ }]}
|
|
|
+
|
|
|
+Both receivers would receive the message and know that the
|
|
|
+`example.org` zone is now at version 123456. Note that multiple users
|
|
|
+may produce the same kind of notification. Also, single group may be
|
|
|
+used to send multiple notification names (but they should be related;
|
|
|
+in our example, the `Notifications/ZoneUpdates` could be used for
|
|
|
+`zone-update`, `zone-available` and `zone-unavailable` notifications
|
|
|
+for change in zone data, configuration of new zone in the system and
|
|
|
+removal of a zone from configuration).
|
|
|
+
|
|
|
+Sending a command to single recipient is slightly more complex. The
|
|
|
+sending user sends a message to the receiving one, addressed either by
|
|
|
+session ID or by an alias (group to which at most one session may be
|
|
|
+subscribed). The message contains the name of the command and
|
|
|
+parameters. It is sent with the undeliverable notifications turned on.
|
|
|
+The user also starts a timer (with reasonably long timeout). The
|
|
|
+sender also subscribes to notifications about terminated sessions or
|
|
|
+unsubscription from the alias group.
|
|
|
+
|
|
|
+The receiving user gets the message, runs the command and sends a
|
|
|
+response back, with the result. The response has the undeliverable
|
|
|
+notification turned off and it is marked as response to the message
|
|
|
+containing the command. The sending user receives the answer and pairs
|
|
|
+it with the command.
|
|
|
+
|
|
|
+There are several things that may go wrong.
|
|
|
+
|
|
|
+* There might be an error on the receiving user (bad parameters, the
|
|
|
+ operation failed, the recipient doesn't know command of that name).
|
|
|
+ The receiving side sends the response as previous, the only
|
|
|
+ difference is the content of the payload. The sending user is
|
|
|
+ notified about it, without delays.
|
|
|
+* The recipient user doesn't exist (either the session ID is wrong or
|
|
|
+ terminated already, or the alias is empty). The system sends a
|
|
|
+ failure response and the sending user knows immediately the command
|
|
|
+ failed.
|
|
|
+* The recipient disconnects while processing the command (possibly
|
|
|
+ crashes). The sender gets a notification about disconnection or
|
|
|
+ unsubscription from the alias group and knows the answer won't come.
|
|
|
+* The recipient ``blackholes'' the command. It receives it, but never
|
|
|
+ answers. The timeout in sender times out. As this is a serious
|
|
|
+ programmer error in the recipient and should be rare, the sender
|
|
|
+ should at least log an error to notify about the case.
|
|
|
+
|
|
|
+One example would be asking the question of life, universe and
|
|
|
+everything (all the examples assume the sending user is already
|
|
|
+subscribed to the notifications):
|
|
|
+
|
|
|
+ s12345 -> DeepThought
|
|
|
+ {"command": ["question", {
|
|
|
+ "what": ["Life", "Universe", "*"]
|
|
|
+ }]}
|
|
|
+ s23456 -> s12345
|
|
|
+ {"reply": [0, 42]}
|
|
|
+
|
|
|
+The deep thought had an alias. But the answer is sent from its session
|
|
|
+ID. The `0` in the reply means ``success''.
|
|
|
+
|
|
|
+Another example might be asking for some data at a bureau and getting
|
|
|
+an error:
|
|
|
+
|
|
|
+ s12345 -> Burreau
|
|
|
+ {"command": ["provide-information", {
|
|
|
+ "about": "me",
|
|
|
+ "topic": "taxes"
|
|
|
+ }]}
|
|
|
+ s23456 -> s12345
|
|
|
+ {"reply": [1, "You need to fill in other form"]}
|
|
|
+
|
|
|
+And, in this example, the sender is trying to reach an non-existent
|
|
|
+session.
|
|
|
+
|
|
|
+ s12345 -> s0
|
|
|
+ {"command": ["ping"]}
|
|
|
+ msgq -> s12345
|
|
|
+ {"reply": [-1, "No such recipient"]}
|
|
|
+
|
|
|
+Last, an example when the other user disconnects while processing the
|
|
|
+command.
|
|
|
+
|
|
|
+ s12345 -> s23456
|
|
|
+ {"command": ["shutdown"]}
|
|
|
+ msgq -> s12345
|
|
|
+ {"notification": ["disconnected", {
|
|
|
+ "lname": "s23456"
|
|
|
+ }]}
|
|
|
+
|
|
|
+The system does not support sending a command to multiple users
|
|
|
+directly. It can be accomplished as this:
|
|
|
+
|
|
|
+* The sending user calls a command on the system to get list of
|
|
|
+ sessions in given group. This is command to alias, so it can be done
|
|
|
+ by the previous way.
|
|
|
+* After receiving the list of session IDs, multiple copies of the
|
|
|
+ command are sent by the sending user, one to each of the session
|
|
|
+ IDs.
|
|
|
+* Successes and failures are handled the same as above, since these
|
|
|
+ are just single-recipient commands.
|
|
|
+
|
|
|
+So, this would be an example with unhelpful war council.
|
|
|
+
|
|
|
+ s12345 -> Msgq
|
|
|
+ {"command": ["get-subscriptions", {
|
|
|
+ "group": "WarCouncil"
|
|
|
+ }]}
|
|
|
+ msgq -> s12345
|
|
|
+ {"reply": [0, ["s1", "s2", "s3"]]}
|
|
|
+ s12345 -> s1
|
|
|
+ {"command": ["advice", {
|
|
|
+ "topic": "Should we attack?"
|
|
|
+ }]}
|
|
|
+ s12345 -> s2
|
|
|
+ {"command": ["advice", {
|
|
|
+ "topic": "Should we attack?"
|
|
|
+ }]}
|
|
|
+ s12345 -> s3
|
|
|
+ {"command": ["advice", {
|
|
|
+ "topic": "Should we attack?"
|
|
|
+ }]}
|
|
|
+ s1 -> s12345
|
|
|
+ {"reply": [0, true]}
|
|
|
+ s2 -> s12345
|
|
|
+ {"reply": [0, false]}
|
|
|
+ s3 -> s12345
|
|
|
|
|
|
Known limitations
|
|
|
-----------------
|
|
@@ -174,8 +326,8 @@ Known limitations
|
|
|
It is meant mostly as signalling protocol. Sending millions of
|
|
|
messages or messages of several tens of megabytes is probably a bad
|
|
|
idea. While there's no architectural limitation with regards of the
|
|
|
-number of transferred messages or their sizes, the code is not
|
|
|
-optimised and it would probably be very slow.
|
|
|
+number of transferred messages and the maximum size of message is 4GB,
|
|
|
+the code is not optimised and it would probably be very slow.
|
|
|
|
|
|
We currently expect the system not to be at heavy load. Therefore, we
|
|
|
expect the daemon to keep up with clients sending messages. The
|