123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382 |
- The IPC protocol
- ================
- While the cc-protocol.txt describes the low-level primitives, here we
- describe how the whole IPC should work and how to use it.
- Definitions
- -----------
- system::
- The system that moves data between the users and does bookkeeping.
- In our current implementation, it is implemented as the MsgQ daemon,
- which the users connect to and it routes the data.
- user::
- Usually a process; generally an entity that wants to communicate
- with the other users.
- session::
- Session is the interface by which the user communicates with the
- system. Single user may have multiple sessions, a session belongs to
- single user.
- message::
- A data blob sent by one user. The recipient might be the system
- itself, other session or set of sessions (called group, see below,
- it is possibly empty). Message is either a response or an original
- message (TODO: Better name?).
- group::
- A named set of sessions. Conceptually, all the possible groups
- exist, there's no explicit creation and deletion of groups.
- session id::
- Unique identifier of a session. It is not reused for the whole
- lifetime of the system. Historically called `lname` in the code.
- undelivery signal::
- While sending an original message, a client may request an
- undelivery signal. If the recipient specification yields no
- sessions to deliver the message to, the system informs user about
- the situation.
- sequence number::
- Each message sent through the system carries a sequence number. The
- number should be unique per sender. It can be used to pair a
- response to the original message, since the response specifies which
- sequence number had the message it response to. Even responses and
- messages not expecting answer have their sequence number, but it is
- generally unused.
- non-blocking operation::
- Operation that will complete without waiting for anything.
- fast operation::
- Operation that may wait for other process, but only for a very short
- time. Generally, this includes communication between the user and
- system, but not between two clients. It can be expected to be fast
- enough to use this inside an interactive session, but may be too
- heavy in the middle of query processing, for example. Every
- non-blocking operation is considered fast.
- The session
- -----------
- The session interface allows for several operations interacting with
- the system. In the code, it is represented by a class.
- Possible operations include:
- Opening a session::
- The session is created and connects to the system. This operation is
- fast. The session receives session id from the system.
- Group management::
- A user may subscribe (become member) of a group, or unsubscribe from
- a group. These are fast operations.
- Send::
- A user may send a message, addressed to the system, or other
- session(s). This operation is expected to be non-blocking
- (current implementation is based on assumption of how OS handles the
- sends, which may need to be revisited if it turns out to be false).
- Receive synchronously::
- User may wait for an incoming message in blocking mode. It is
- possible to specify the kind of message to wait for, either original
- message or response to a message. This interface has a timeout.
- Receive asynchronously::
- Similar to previous, but non-blocking. It terminates immediately.
- The user provides a callback that is invoked when the requested
- message arrives.
- Terminate::
- A session may be terminated. No more messages are sent or received
- over it, the session is automatically unsubscribed from all the
- groups. This operation is non-blocking. A session is terminated
- automatically if the user exits.
- Assumptions
- -----------
- We assume reliability and order of delivery. Messages sent from user A
- to B are all delivered unchanged in original order as long as B
- exists.
- All above operations are expected to always succeed. If there's an
- error reported, it should be considered fatal and user should
- exit. In case a user still wants to continue, the session must be
- considered terminated and a new one must be created. Care must be
- taken not to use any information obtained from the previous session,
- since the state in other users and the system may have changed during
- the reconnect.
- Addressing
- ----------
- Addressing happens in three ways:
- By group name::
- The message is routed to all the sessions subscribed to this group.
- It is legal to address an empty group; such message is then
- delivered to no sessions.
- By session ID::
- The message is sent to the single session, if it is still alive.
- By an alias::
- A session may have any number of aliases - well known names. Only
- single session may hold given alias (but it is not yet enforced by
- the system). The message is delivered to the one session owning the
- alias, if any. Internally, the aliases are implemented as groups
- with single subscribed session, so it is the same as the first
- option on the protocol level, but semantically it is different.
- The system
- ----------
- The system performs these goals:
- * Maintains the open sessions and allows creating new ones.
- * Keeps information about groups and which sessions are subscribed to
- which group.
- * Routes the messages between users.
- Also, the system itself is a user of the system. It can be reached by
- the alias `Msgq` and provides following high-level services (see
- below):
- Notifications about sessions::
- When a session is opened to the system or when a session is
- terminated, a notification is sent to interested users. The
- notification contains the session ID of the session in question.
- The termination notification is probably more useful (if a user
- communicated with a given session before, it might be interested it
- is no longer available), the opening notification is provided mostly
- for completeness.
- Notifications about group subscriptions::
- When a session subscribes to a group or unsubscribes from a group, a
- notification is sent to interested users. The notification contains
- both the session ID of the session subscribing/unsubscribing and
- name of the group. This includes notifications about aliases (since
- aliases are groups internally).
- Commands to list sessions::
- There's a command to list session IDs of all currently opened sessions
- and a command to list session IDs of all sessions subscribed to a
- given group. Note that using these lists might need some care, as
- the information might be outdated at the time it is delivered to the
- user.
- User shows interest in notifications about sessions and group
- subscriptions by subscribing to a group with well-known name (as with
- any notification).
- Note that due to implementation details, the `Msgq` alias is not yet
- available during early stage of the bootstrap of bind10 system. This
- means some very core services can't rely on the above services of the
- system. The alias is guaranteed to be working before the first
- non-core module is started.
- Higher-level services
- ---------------------
- While the system is able to send any kind of data, the payload sent by
- users in bind10 is structured data encoded as JSON. The messages sent
- are of three general types:
- Command::
- A message sent to single destination, with the undeliverable
- signal turned on and expecting an answer. This is a request
- to perform some operation on the recipient (it can have side effects
- or not). The command is identified by a name and it can have
- parameters. A command with the same name may behave differently (or
- have different parameters) on different receiving users.
- Reply::
- An answer to the `Command`. It is sent directly to the session where
- the command originated from, does not expect further answer and the
- undeliverable notification is not set. It either confirms the
- command was run successfully and contains an optional result, or
- notifies the sender of failure to run the command. Success and
- failure differ only in the payload sent through the system, not in
- the way it is sent. The undeliverable signal is failure
- reply sent by the system on behalf of the missing recipient.
- Notification::
- A message sent to any number of destinations (eg. sent to a group),
- not expecting an answer. It notifies other users about an event or
- change of state.
- Details of the higher-level
- ---------------------------
- While there are libraries implementing the communication in convenient
- way, it is useful to know what happens inside.
- The notifications are probably the simplest. Users interested in
- receiving notifications of some family subscribe to corresponding
- group. Then, a client sends a message to the group. For example, if
- clients `receiver-A` and `receiver-B` want to receive notifications
- about changes to zone data, they'd subscribe to the
- `Notifications/ZoneUpdates` group. Then, other client (let's say
- `XfrIn`, with session ID `s12345`) would send something like:
- s12345 -> Notifications/ZoneUpdates
- {"notification": ["zone-update", {
- "class": "IN",
- "origin": "example.org.",
- "serial": 123456
- }]}
- Both receivers would receive the message and know that the
- `example.org` zone is now at version 123456. Note that multiple users
- may produce the same kind of notification. Also, single group may be
- used to send multiple notification names (but they should be related;
- in our example, the `Notifications/ZoneUpdates` could be used for
- `zone-update`, `zone-available` and `zone-unavailable` notifications
- for change in zone data, configuration of new zone in the system and
- removal of a zone from configuration).
- Sending a command to single recipient is slightly more complex. The
- sending user sends a message to the receiving one, addressed either by
- session ID or by an alias (group to which at most one session may be
- subscribed). The message contains the name of the command and
- parameters. It is sent with the undeliverable signals turned on.
- The user also starts a timer (with reasonably long timeout). The
- sender also subscribes to notifications about terminated sessions or
- unsubscription from the alias group.
- The receiving user gets the message, runs the command and sends a
- response back, with the result. The response has the undeliverable
- signal turned off and it is marked as response to the message
- containing the command. The sending user receives the answer and pairs
- it with the command.
- There are several things that may go wrong.
- * There might be an error on the receiving user (bad parameters, the
- operation failed, the recipient doesn't know command of that name).
- The receiving side sends the response as previous, the only
- difference is the content of the payload. The sending user is
- notified about it, without delays.
- * The recipient user doesn't exist (either the session ID is wrong or
- terminated already, or the alias is empty). The system sends a
- failure response and the sending user knows immediately the command
- failed.
- * The recipient disconnects while processing the command (possibly
- crashes). The sender gets a notification about disconnection or
- unsubscription from the alias group and knows the answer won't come.
- * The recipient ``blackholes'' the command. It receives it, but never
- answers. The timeout in sender times out. As this is a serious
- programmer error in the recipient and should be rare, the sender
- should at least log an error to notify about the case.
- One example would be asking the question of life, universe and
- everything (all the examples assume the sending user is already
- subscribed to the notifications):
- s12345 -> DeepThought
- {"command": ["question", {
- "what": ["Life", "Universe", "*"]
- }]}
- s23456 -> s12345
- {"reply": [0, 42]}
- The deep thought had an alias. But the answer is sent from its session
- ID. The `0` in the reply means ``success''.
- Another example might be asking for some data at a bureau and getting
- an error:
- s12345 -> Burreau
- {"command": ["provide-information", {
- "about": "me",
- "topic": "taxes"
- }]}
- s23456 -> s12345
- {"reply": [1, "You need to fill in other form"]}
- And, in this example, the sender is trying to reach an non-existent
- session. The `msgq` here is not the alias `Msgq`, but a special
- ``phantom'' session ID that is not listed anywhere.
- s12345 -> s0
- {"command": ["ping"]}
- msgq -> s12345
- {"reply": [-1, "No such recipient"]}
- Last, an example when the other user disconnects while processing the
- command.
- s12345 -> s23456
- {"command": ["shutdown"]}
- msgq -> s12345
- {"notification": ["disconnected", {
- "lname": "s23456"
- }]}
- The system does not support sending a command to multiple users
- directly. It can be accomplished as this:
- * The sending user calls a command on the system to get list of
- sessions in given group. This is command to alias, so it can be done
- by the previous way.
- * After receiving the list of session IDs, multiple copies of the
- command are sent by the sending user, one to each of the session
- IDs.
- * Successes and failures are handled the same as above, since these
- are just single-recipient commands.
- So, this would be an example with unhelpful war council.
- s12345 -> Msgq
- {"command": ["get-subscriptions", {
- "group": "WarCouncil"
- }]}
- msgq -> s12345
- {"reply": [0, ["s1", "s2", "s3"]]}
- s12345 -> s1
- {"command": ["advice", {
- "topic": "Should we attack?"
- }]}
- s12345 -> s2
- {"command": ["advice", {
- "topic": "Should we attack?"
- }]}
- s12345 -> s3
- {"command": ["advice", {
- "topic": "Should we attack?"
- }]}
- s1 -> s12345
- {"reply": [0, true]}
- s2 -> s12345
- {"reply": [0, false]}
- s3 -> s12345
- {"reply": [1, "Advice feature not implemented"]}
- Users
- -----
- While there's a lot of flexibility for the behaviour of a user, it
- usually comes to something like this (during the lifetime of the
- user):
- * The user starts up.
- * Then it creates one or more sessions (there may be technical reasons
- to have more than one session, such as threads, but it is not
- required by the system).
- * It subscribes to some groups to receive notifications in future.
- * It binds to some aliases if it wants to be reachable by others by a
- nice name.
- * It invokes some start-up commands (to get the configuration, for
- example).
- * During the lifetime, it listens for notifications and answers
- commands. It also invokes remote commands and sends notifications
- about things that are happening.
- * Eventually, the user terminates, closing all the sessions it had
- opened.
- Known limitations
- -----------------
- It is meant mostly as signalling protocol. Sending millions of
- messages or messages of several tens of megabytes is probably a bad
- idea. While there's no architectural limitation with regards of the
- number of transferred messages and the maximum size of message is 4GB,
- the code is not optimised and it would probably be very slow.
- We currently expect the system not to be at heavy load. Therefore, we
- expect the system to keep up with users sending messages. The
- libraries write in blocking mode, which is no problem if the
- expectation is true, as the write buffers will generally be empty and
- the write wouldn't block, but if it turns out it is not the case, we
- might need to reconsider.
|