123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185 |
- The IPC protocol
- ================
- While the cc-protocol.txt describes the low-level primitives, here we
- describe how the whole IPC should work and how to use it.
- Assumptions
- -----------
- We assume the low-level protocol keeps ordering of messages. That is,
- if A sends messages 1 and 2 to B, they get delivered in the same order
- as they were sent. However, if A sends message 1 to B and 2 to C, the
- order in which get them or the order in which they answer is not
- defined.
- We also assume that the delivery is reliable. If B gets a message from
- A, it can be sure that all previous messages were delivered too. If A
- sends a message to B, B either gets the message or either A or B is
- disconnected during the attempt.
- Also, we expect the messages don't get damaged or modified on their
- way.
- On unrecoverable error (errors like EINTR or short read/write are
- recoverable, since there's clear way how to continue without losing
- any messages, errors like connection reset are unrecoverable), the
- client should abort completely. If it deems better to reconnect, it
- must assume anything might have happened during the time and start
- communication from scratch, discarding any knowledge gathered from the
- previous connection (configuration, addresses of other clients, etc).
- Addressing
- ----------
- We can specify the recipient in two different ways:
- * Directly. Each connected client has an unique address. A message
- addressed to that address is sent only to the one client.
- * By a group. A client might subscribe to any number of groups.
- When a message is sent to the group, all clients subscribed to the
- group receive it. It is legal to send to an empty group.
- [NOTE]
- If it is possible a group may contain multiple recipients, it is
- discouraged to send messages expecting an answer addressed to the
- group. It is not known how many answers are to come. See below for
- details on one-to-many communication.
- Feedback from the IPC system
- ----------------------------
- The IPC system generates some additional information to aid the
- communicating clients.
- Undeliverable notification::
- If the client requests it (by a per-message flag) and the set of
- recipients specified is empty (either because the connection
- ID/lname is not connected or because the addressed group is empty),
- an answer message is sent from the daemon to notify it about
- the situation. However, since the recipient still can take a long
- time to answer (if it exists), clients that need high availability
- should not wait for the answer in blocking way.
- Notifications about connections and disconnections::
- The system generates notification about following events:
- * Client connected (sent with the lname of the client)
- * Client disconnected (sent with the lname of the client)
- * Client subscribed (sent with the name of group and lname of
- client)
- * Client unsubscribed (sent with the name of group and lname of
- client)
- List of group members:
- The daemon provides a command to list lnames of clients subscribed
- to given group, and lnames of all connections.
- Communication paradigms
- -----------------------
- Event notifications
- ~~~~~~~~~~~~~~~~~~~
- Sometimes, an event that may be interesting to other parts of the
- system happens. The originating module may not know what other modules
- are interested in that kind of event, nor it may know if any at all
- wants to know that. With such event, the originating module does not
- need any feedback.
- For each kind or family of notifications, there's a group. Everybody
- interested in that family of notifications subscribes to the group.
- When the event happens, it is sent (broadcasted) to the group, without
- requiring an answer.
- [[NOTE]]
- A care should be taken to avoid race conditions. Imagine one module
- provides some kind of state (let's say it's the configuration manager
- and the configuration is the shared state). The other modules are
- using notifications to update their copy when the configuration
- changes (eg. when the configuration changes, the configuration manager
- sends a notification with description of the change).
- The correct order is to first subscribe to the notifications and then
- request the whole configuration. If it was done the other way around,
- there would be a short time between the request and the subscription
- when an update to the state could happen without the module noticing.
- With first subscribing, the notification could come before the initial
- version is known or arrive even when the initial version already
- includes the change, but these are possible to handle, while the
- missing update is not.
- One-to-one RPC call
- ~~~~~~~~~~~~~~~~~~~
- Sometimes, a process needs to call remote function (or command) in
- other process. An example could be asking the configuration manager
- for the current configuration or asking it to change it, asking single
- process to terminate, etc.
- It may be that the group is a singleton group (eg. the command
- manager, there must be exactly one in a running system, and is used
- just as a stable name for the process) or an lname received by means
- of other communication (like a previous subscribe notification).
- A command message (containing the parameters, name of the command,
- etc) is sent, with the want-answer flag set. The other side processes
- the command and sends a result or error back.
- If the recipient does not exist, the daemon sends an error right away.
- There are still two ways this may fail to provide an answer:
- * The receiving module reads the command, but does not provide an
- answer. Clearly, such module is broken. There should be some (long)
- timeout for this situation, and loud logging to get it fixed.
- * The receiving module terminated at the exact time when daemon tried
- to send to it, or crashed handling the command. Therefore the
- sender listens for disconnect or unsubscription notifications
- (depending on if it was sent by lname or group name) and if the
- recipient disconnects, the sender knows it should not expect the
- answer any more.
- An asynchronous waiting for the answer is preferred.
- One-to-many RPC call
- ~~~~~~~~~~~~~~~~~~~~
- Sometimes it is needed to send a command to bunch of modules at once,
- usually all members of a group that can contain any number of clients.
- This would be done by requesting the members of the group from the
- daemon and then sending a one-to-one RPC call to each of them,
- tracking them separately.
- [NOTE]
- It might happen the list of group members changes between the time it
- was requested and the time the commands are sent. If a client gets
- disconnected, the sender gets an undeliverable error back from the
- daemon. If anything else happens (the client unsubscribes, connects,
- subscribes), it must explicitly synchronise to the state anyway,
- because we could have sent the commands before the change actually
- happened and it would look the same to the client.
- [WARNING]
- It would look better to first request the list of group members and
- then send the command to the group, and use the list to track the
- answers only. But that is prone to race conditions ‒ if there's any
- change between the request for the member list and sending the
- command, the actual recipients don't match the list and the server
- could get more answers than expected or could wait for answer of a
- module that no longer exists.
- Known limitations
- -----------------
- It is meant mostly as signalling protocol. Sending millions of
- messages or messages of several tens of megabytes is probably a bad
- idea. While there's no architectural limitation with regards of the
- number of transferred messages or their sizes, the code is not
- optimised and it would probably be very slow.
- We currently expect the system not to be at heavy load. Therefore, we
- expect the daemon to keep up with clients sending messages. The
- libraries write in blocking mode, which is no problem if the
- expectation is true, as the write buffers will generally be empty and
- the write wouldn't block, but if it turns out it is not the case, we
- might need to reconsider.
|