Browse Source

Merge #2671

Clean up docs/design/cc-protocol.txt

Conflicts:
	doc/design/cc-protocol.txt
Michal 'vorner' Vaner 12 years ago
parent
commit
4c3b2b24d8
1 changed files with 133 additions and 244 deletions
  1. 133 244
      doc/design/cc-protocol.txt

+ 133 - 244
doc/design/cc-protocol.txt

@@ -1,296 +1,185 @@
-protocol version 0x536b616e
+The CC protocol
+===============
 
-DATA        0x01
-HASH        0x02
-LIST        0x03
-NULL        0x04
-TYPE_MASK   0x0f
+We use our home-grown protocol for IPC between modules. There's a
+central daemon routing the messages.
 
-LENGTH_32   0x00
-LENGTH_16   0x10
-LENGTH_8    0x20
-LENGTH_MASK 0xf0
-
-
-MESSAGE ENCODING
-----------------
-
-When decoding, the entire message length must be known.  If this is
-transmitted over a raw stream such as TCP, this is usually encoded
-with a 4-byte length followed by the message itself.  If some other
-wrapping is used (say as part of a different message structure) the
-length of the message must be preserved and included for decoding.
-
-The first 4 bytes of the message is the protocol version encoded
-directly as a 4-byte value.  Immediately following this is a HASH
-element.  The length of the hash element is the remainder of the
-message after subtracting 4 bytes for the protocol version.
-
-This initial HASH is intended to be used by the message routing system
-if one is in use.
-
-
-ITEM TYPES
+Addressing
 ----------
 
-There are four basic types encoded in this protocol.  A simple data
-blob (DATA), a tag-value series (HASH), an ordered list (LIST), and
-a NULL type (which is used internally to encode DATA types which are
-empty and can be used to indicate existance without data in a hash.)
-
-Each item can be of any type, so a hash of hashes and hashes of lists
-are typical.
-
-All multi-byte integers which are encoded in binary are in network
-byte order.
-
-
-ITEM ENCODING
--------------
-
-Each item is preceeded by a single byte which describes that item.
-This byte contains the item type and item length encoding:
-
-    Thing             Length    Description
-    ----------------  --------  ------------------------------------
-    TyLen             1 byte    Item type and length encoding
-    Length            variable  Item data blob length
-    Item Data         variable  Item data blob
-
-The TyLen field includes both the item data type and the item's
-length.  The length bytes are encoded depending on the length of data
-portion, and the smallest data encoding type supported should be
-used.  Note that this length compression is used just for data
-compactness.  It is wasteful to encode the most common length (8-bit
-length) as 4 bytes, so this method allows one byte to be used rather
-than 4, three of which are nearly always zero.
-
-
-HASH
-----
-
-This is a tag/value pair where each tag is an opaque unique blob and
-the data elements are of any type.  Hashes are not encoded in any
-specific tag or item order.
-
-The length of the HASH's data area is processed for tag/value pairs
-until the entire area is consumed.  Running out of data prematurely
-indicates an incorrectly encoded message.
-
-The data area consists of repeated items:
-
-    Thing             Length    Description
-    ----------------  --------  ------------------------------------
-    Tag Length       1 byte    The length of the tag.
-    Tag              Variable  The tag name
-    Item             Variable  Encoded item
-
-The Tag Length field is always one byte, which limits the tag name to
-255 bytes maximum.  A tag length of zero is invalid.
-
-
-LIST
-----
-
-A LIST is a list of items encoded and decoded in a specific order.
-The order is chosen entirely by the source curing encoding.
-
-The length of the LIST's data is consumed by the ITEMs it contains.
-Running out of room prematurely indicates an incorrectly encoded
-message.
-
-The data area consists of repeated items:
+Each connected client gets an unique address, called ``l-name''. A
+message can be sent directly to such l-name, if it is known to the
+sender.
 
-     Thing           Length    Description
-     --------------  ------    ----------------------------------------
-     Item	     Variable  Encoded item
+A client may subscribe to a group of communication. A message can be
+broadcasted to a whole group instead of a single client. There's also
+an instance parameter to addressing, but we didn't find any actual use
+for it and it is not used for anything. It is left in the default `*`
+for most of our code and should be done so in any new code. It wasn't
+priority to remove it yet.
 
+Wire format
+-----------
 
-DATA
-----
+Each message on the wire looks like this:
 
-A DATA item is a simple blob of data.  No further processing of this
-data is performed by this protocol on these elements.
+  <message length><header length><header><body>
 
-The data blob is the entire data area.  The data area can be 0 or more
-bytes long.
+The message length is 4-byte unsigned integer in network byte order,
+specifying the number of bytes of the rest of the message (eg. header
+length, header and body put together).
 
-It is typical to encode integers as strings rather than binary
-integers.  However, so long as both sender and recipient agree on the
-format of the data blob itself, any blob encoding may be used.
+The header length is 2-byte unsigned integer in network byte order,
+specifying the length of the header.
 
+The header is a string representation of single JSON object. It
+specifies the type of message and routing information.
 
-NULL
-----
+The body is the payload of the message. It takes the whole rest of
+size of the message (so its length is message length - 2 - header
+length). The content is not examined by the routing daemon, but the
+clients expect it to be valid JSON object.
 
-This data element indicates no data is actually present.  This can be
-used to indicate that a tag is present in a HASH but no data is
-actually at that location, or in a LIST to indicate empty item
-positions.
+The body may be empty in case the message is not to be routed to
+client, but it is instruction for the routing daemon. See message
+types below.
 
-There is no data portion of this type, and the encoded length is
-ignored and is always zero.
+The message is sent in this format to the routing daemon, the daemon
+optionally modifies the headers and delivers it in the same format to
+the recipient(s).
 
-Note that this is different than a DATA element with a zero length.
+The headers
+-----------
 
+The header object can contain following information:
 
-EXAMPLE
--------
-
-This is Ruby syntax, but should be clear enough for anyone to read.
-
-Example data encoding:
-
-{
-  "from" => "sender@host",
-  "to" => "recipient@host",
-  "seq" => 1234,
-  "data" => {
-    "list" => [ 1, 2, nil, "this" ],
-    "description" => "Fun for all",
-  },
-}
-
-
-Wire-format:
-
-In this format, strings are not shown in hex, but are included "like
-this."  Descriptions are written (like this.)
-
-Message Length: 0x64 (100 bytes)
-Protocol Version:  0x53 0x6b 0x61 0x6e
-(remaining length: 96 bytes)
-
-0x04 "from" 0x21 0x0b "sender@host"
-0x02 "to" 0x21 0x0e "recipient@host"
-0x03 "seq" 0x21 0x04 "1234"
-0x04 "data" 0x22
-  0x04 "list" 0x23 
-    0x21 0x01 "1"
-    0x21 0x01 "2"
-    0x04
-    0x21 0x04 "this"
-  0x0b "description" 0x0b "Fun for all"
-
-
-MESSAGE ROUTING
----------------
-
-The message routing daemon uses the top-level hash to contain routing
-instructions and additional control data.  Not all of these are
-required for various control message types; see the individual
-descriptions for more information.
-
-    Tag      Description
-    -------  ----------------------------------------
-    msg      Sender-supplied data
-    from     sender's identity
-    group    Group name this message is being sent to
-    instance Instance in this group
-    repl     if present, this message is a reply.
-    seq	     sequence number, used in replies
-    to	     recipient or "*" for no specific receiver
-    type     "send" for a channel message
-
-
-"type" is a DATA element, which indicates to the message routing
-system what the purpose of this message is.
+|====================================================================================================
+|Name       |type  |Description
+|====================================================================================================
+|from       |string|Sender's l-name
+|type       |string|Type of the message. The routed message is "send".
+|group      |string|The group to deliver to.
+|instance   |string|Instance in the group. Purpose lost in history. Defaults to "*".
+|to         |string|Override recipient (group/instance ignored).
+|seq        |int   |Tracking number of the message.
+|reply      |int   |If present, contains a seq number of message this is a reply to.
+|want_answer|bool  |If present and true, the daemon generates error if there's no matching recipient.
+|====================================================================================================
 
+Types of messages
+-----------------
 
 Get Local Name (type "getlname")
---------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Upon connection, this is the first message to be sent to the control
-daemon.  It will return the local name of this client.  Each
-connection gets its own unique local name, and local names are never
-repeated.  They should be considered opaque strings, in a format
-useful only to the message routing system.  They are used in replies
-or to send to a specific destination.
+Upon connection, this is the first message to be sent to the daemon.
+It will return the local name of this client.  Each connection gets
+its own unique local name, and local names are never repeated.  They
+should be considered opaque strings, in a format useful only to the
+message routing system.  They are used in replies or to send to a
+specific destination.
 
 To request the local name, the only element included is the
-  "type" => "getlname"
+  {"type": "getlname"}
 tuple.  The response is also a simple, single tuple:
-  "lname" => "UTF-8 encoded local name blob"
+  {"lname" => "Opaque utf-8 string"}
 
 Until this message is sent, no other types of messages may be sent on
 this connection.
 
-
 Regular Group Messages (type "send")
-------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-When sending a message:
+Message routed to other client. This one expects the body to be
+non-empty.
 
-"msg" is the sender supplied data.  It is encoded as per its type.
-It is a required field, but may be the NULL type if not needed.
-In OpenReg, this was another wire format message, stored as an
-ITEM_DATA.  This was done to make it easy to decode the routing
-information without having to decode arbitrary application-supplied
-data, but rather treat this application data as an opaque blob.
+Expected headers are:
 
-"from" is a DATA element, and its value is a UTF-8 encoded sender
-identity.  It MUST be the "local name" supplied by the message
-routing system upon connection.  The message routing system will
-enforce this, but will not add it.  It is a required field.
+* from
+* group
+* instance (set to "*" if no specific instance desired)
+* seq (should be unique for the sender)
+* to (set to "*" if not directed to specific client)
+* reply (optional, only if it is reply)
+* want_answer (optional, only when not a reply)
 
-"group" is a DATA element, and its value is the UTF-8 encoded group
-name this message is being transmitted to.  It is a required field for
-all messages of type "send".
+A client does not see its own transmissions.
 
-"instance" is a DATA element, and its value is the UTF-8 encoded
-instance name, with "*" meaning all instances.
+Group Subscriptions (type "subscribe")
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-"repl" is the sequence number being replied to, if this is a reply.
+Indicates the sender wants to be included in the given group.
 
-"seq" is a unique identity per client.  That is, the <lname, seq>
-tuple must be unique over the lifetime of the connection, or at least
-over the lifetime of the expected reply duration.
+Expected headers are:
 
-"to" is a DATA element, and its value is a UTF-8 encoded recipient
-identity.  This must be a specific recipient name or "*" to indicate
-"all listeners on this channel."  It is a required field.
+* group
+* instance (leave at "*" for default)
 
-When a message of type "send" is received by the client, all the data
-is used as above.  This indicates a message of the given type was
-received.
+There is no response to this message and the client is subscribed to
+the given group and instance.
 
-A client does not see its own transmissions. (XXXMLG Need to check this)
+The group can be any utf-8 string and the group doesn't have to exist
+before (it is created when at least one client is in it). A client may
+be subscribed in multiple groups.
 
+Group Unsubscribe (type "unsubscribe")
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Group Subscriptions (type "subscribe")
---------------------------------------
+The headers to be included are "group" and "instance" and have the same
+meaning as a "subscribe" message. Only, the client is removed from the
+group.
 
-A subscription requires the "group", "instance", and a flag to
-indicate the subscription type ("subtype").  If instance is "*" the
-instance name will be ignored when deciding to forward a message to
-this client or not.
+Transmitted messages
+--------------------
 
-"subtype" is a DATA element, and contains "normal" for normal channel
-subscriptions, "meonly" for only those messages on a channel with the
-recipient specified exactly as the local name, or "promisc" to receive
-all channel messages regardless of other filters.  As its name
-implies, "normal" is for typical subscriptions, and "promisc" is
-intended for channel message debugging.
+These are the messages generally transmitted in the body of the
+message.
 
-There is no response to this message.
+Command
+~~~~~~~
 
+It is a command from one process to another, to do something or send
+some information. It is identified by a name and can optionally have
+parameters. It'd look like this:
 
-Group Unsubscribe (type "unsubscribe")
--------------------------------
+  {"command": ["name", <parameters>]}
+
+The parameters may be omitted (then the array is 1 element long). If
+present, it may be any JSON element. However, the most usual is an
+object with named parameter values.
+
+It is usually transmitted with the `want_answer` header turned on to
+cope with the situation the remote end doesn't exist, and sent to a
+group (eg. `to` with value of `*`).
+
+Success reply
+~~~~~~~~~~~~~
+
+When the command is successful, the other side answers by a reply of
+the following format:
+
+  {"result": [0, <result>]}
+
+The result is the return value of the command. It may be any JSON
+element and it may be omitted (for the case of ``void'' function).
 
-The fields to be included are "group" and "instance" and have the same
-meaning as a "subscribe" message.
+This is transmitted with the `reply` header set to the `seq` number of
+the original command. It is sent with the `to` header set.
 
-There is no response to this message.
+Error reply
+~~~~~~~~~~~
 
+In case something goes wrong, an error reply is sent. This is similar
+as throwing an exception from local function. The format is similar:
 
-Statistics (type "stats")
--------------------------
+  {"result": [ecode, "Error description"]}
 
-Request statistics from the message router.  No other fields are
-inclued in the request.
+The `ecode` is non-zero error code. Most of the current code uses `1`
+for all errors. The string after that is mandatory and must contain a
+human-readable description of the error.
 
-The response contains a single element "stats" which is an opaque
-element.  This is used mostly for debugging, and its format is
-specific to the message router.  In general, some method to simply
-dump raw messages would produce something useful during debugging.
+The negative error codes are reserved for errors from the daemon.
+Currently, only `-1` is used and it is generated when a message with
+`reply` not included is sent, it has the `want_answer` header set to
+`true` and there's no recipient to deliver the message to. This
+usually means a command was sent to a non-existent recipient.