12 years ago · 4c3b2b24d8
--- a/doc/design/cc-protocol.txt
+++ b/doc/design/cc-protocol.txt
@@ -1,296 +1,185 @@
 
				-protocol version 0x536b616e
			
 
				+The CC protocol
			
 
				+===============
			
 
				 
			
 
				-DATA        0x01
			
 
				-HASH        0x02
			
 
				-LIST        0x03
			
 
				-NULL        0x04
			
 
				-TYPE_MASK   0x0f
			
 
				+We use our home-grown protocol for IPC between modules. There's a
			
 
				+central daemon routing the messages.
			
 
				 
			
 
				-LENGTH_32   0x00
			
 
				-LENGTH_16   0x10
			
 
				-LENGTH_8    0x20
			
 
				-LENGTH_MASK 0xf0
			
 
				-
			
 
				-
			
 
				-MESSAGE ENCODING
			
 
				-----------------
			
 
				-
			
 
				-When decoding, the entire message length must be known.  If this is
			
 
				-transmitted over a raw stream such as TCP, this is usually encoded
			
 
				-with a 4-byte length followed by the message itself.  If some other
			
 
				-wrapping is used (say as part of a different message structure) the
			
 
				-length of the message must be preserved and included for decoding.
			
 
				-
			
 
				-The first 4 bytes of the message is the protocol version encoded
			
 
				-directly as a 4-byte value.  Immediately following this is a HASH
			
 
				-element.  The length of the hash element is the remainder of the
			
 
				-message after subtracting 4 bytes for the protocol version.
			
 
				-
			
 
				-This initial HASH is intended to be used by the message routing system
			
 
				-if one is in use.
			
 
				-
			
 
				-
			
 
				-ITEM TYPES
			
 
				+Addressing
			
 
				 ----------
			
 
				 
			
 
				-There are four basic types encoded in this protocol.  A simple data
			
 
				-blob (DATA), a tag-value series (HASH), an ordered list (LIST), and
			
 
				-a NULL type (which is used internally to encode DATA types which are
			
 
				-empty and can be used to indicate existance without data in a hash.)
			
 
				-
			
 
				-Each item can be of any type, so a hash of hashes and hashes of lists
			
 
				-are typical.
			
 
				-
			
 
				-All multi-byte integers which are encoded in binary are in network
			
 
				-byte order.
			
 
				-
			
 
				-
			
 
				-ITEM ENCODING
			
 
				--------------
			
 
				-
			
 
				-Each item is preceeded by a single byte which describes that item.
			
 
				-This byte contains the item type and item length encoding:
			
 
				-
			
 
				-    Thing             Length    Description
			
 
				-    ----------------  --------  ------------------------------------
			
 
				-    TyLen             1 byte    Item type and length encoding
			
 
				-    Length            variable  Item data blob length
			
 
				-    Item Data         variable  Item data blob
			
 
				-
			
 
				-The TyLen field includes both the item data type and the item's
			
 
				-length.  The length bytes are encoded depending on the length of data
			
 
				-portion, and the smallest data encoding type supported should be
			
 
				-used.  Note that this length compression is used just for data
			
 
				-compactness.  It is wasteful to encode the most common length (8-bit
			
 
				-length) as 4 bytes, so this method allows one byte to be used rather
			
 
				-than 4, three of which are nearly always zero.
			
 
				-
			
 
				-
			
 
				-HASH
			
 
				-----
			
 
				-
			
 
				-This is a tag/value pair where each tag is an opaque unique blob and
			
 
				-the data elements are of any type.  Hashes are not encoded in any
			
 
				-specific tag or item order.
			
 
				-
			
 
				-The length of the HASH's data area is processed for tag/value pairs
			
 
				-until the entire area is consumed.  Running out of data prematurely
			
 
				-indicates an incorrectly encoded message.
			
 
				-
			
 
				-The data area consists of repeated items:
			
 
				-
			
 
				-    Thing             Length    Description
			
 
				-    ----------------  --------  ------------------------------------
			
 
				-    Tag Length       1 byte    The length of the tag.
			
 
				-    Tag              Variable  The tag name
			
 
				-    Item             Variable  Encoded item
			
 
				-
			
 
				-The Tag Length field is always one byte, which limits the tag name to
			
 
				-255 bytes maximum.  A tag length of zero is invalid.
			
 
				-
			
 
				-
			
 
				-LIST
			
 
				-----
			
 
				-
			
 
				-A LIST is a list of items encoded and decoded in a specific order.
			
 
				-The order is chosen entirely by the source curing encoding.
			
 
				-
			
 
				-The length of the LIST's data is consumed by the ITEMs it contains.
			
 
				-Running out of room prematurely indicates an incorrectly encoded
			
 
				-message.
			
 
				-
			
 
				-The data area consists of repeated items:
			
 
				+Each connected client gets an unique address, called ``l-name''. A
			
 
				+message can be sent directly to such l-name, if it is known to the
			
 
				+sender.
			
 
				 
			
 
				-     Thing           Length    Description
			
 
				-     --------------  ------    ----------------------------------------
			
 
				-     Item	     Variable  Encoded item
			
 
				+A client may subscribe to a group of communication. A message can be
			
 
				+broadcasted to a whole group instead of a single client. There's also
			
 
				+an instance parameter to addressing, but we didn't find any actual use
			
 
				+for it and it is not used for anything. It is left in the default `*`
			
 
				+for most of our code and should be done so in any new code. It wasn't
			
 
				+priority to remove it yet.
			
 
				 
			
 
				+Wire format
			
 
				+-----------
			
 
				 
			
 
				-DATA
			
 
				-----
			
 
				+Each message on the wire looks like this:
			
 
				 
			
 
				-A DATA item is a simple blob of data.  No further processing of this
			
 
				-data is performed by this protocol on these elements.
			
 
				+  <message length><header length><header><body>
			
 
				 
			
 
				-The data blob is the entire data area.  The data area can be 0 or more
			
 
				-bytes long.
			
 
				+The message length is 4-byte unsigned integer in network byte order,
			
 
				+specifying the number of bytes of the rest of the message (eg. header
			
 
				+length, header and body put together).
			
 
				 
			
 
				-It is typical to encode integers as strings rather than binary
			
 
				-integers.  However, so long as both sender and recipient agree on the
			
 
				-format of the data blob itself, any blob encoding may be used.
			
 
				+The header length is 2-byte unsigned integer in network byte order,
			
 
				+specifying the length of the header.
			
 
				 
			
 
				+The header is a string representation of single JSON object. It
			
 
				+specifies the type of message and routing information.
			
 
				 
			
 
				-NULL
			
 
				-----
			
 
				+The body is the payload of the message. It takes the whole rest of
			
 
				+size of the message (so its length is message length - 2 - header
			
 
				+length). The content is not examined by the routing daemon, but the
			
 
				+clients expect it to be valid JSON object.
			
 
				 
			
 
				-This data element indicates no data is actually present.  This can be
			
 
				-used to indicate that a tag is present in a HASH but no data is
			
 
				-actually at that location, or in a LIST to indicate empty item
			
 
				-positions.
			
 
				+The body may be empty in case the message is not to be routed to
			
 
				+client, but it is instruction for the routing daemon. See message
			
 
				+types below.
			
 
				 
			
 
				-There is no data portion of this type, and the encoded length is
			
 
				-ignored and is always zero.
			
 
				+The message is sent in this format to the routing daemon, the daemon
			
 
				+optionally modifies the headers and delivers it in the same format to
			
 
				+the recipient(s).
			
 
				 
			
 
				-Note that this is different than a DATA element with a zero length.
			
 
				+The headers
			
 
				+-----------
			
 
				 
			
 
				+The header object can contain following information:
			
 
				 
			
 
				-EXAMPLE
			
 
				--------
			
 
				-
			
 
				-This is Ruby syntax, but should be clear enough for anyone to read.
			
 
				-
			
 
				-Example data encoding:
			
 
				-
			
 
				-{
			
 
				-  "from" => "sender@host",
			
 
				-  "to" => "recipient@host",
			
 
				-  "seq" => 1234,
			
 
				-  "data" => {
			
 
				-    "list" => [ 1, 2, nil, "this" ],
			
 
				-    "description" => "Fun for all",
			
 
				-  },
			
 
				-}
			
 
				-
			
 
				-
			
 
				-Wire-format:
			
 
				-
			
 
				-In this format, strings are not shown in hex, but are included "like
			
 
				-this."  Descriptions are written (like this.)
			
 
				-
			
 
				-Message Length: 0x64 (100 bytes)
			
 
				-Protocol Version:  0x53 0x6b 0x61 0x6e
			
 
				-(remaining length: 96 bytes)
			
 
				-
			
 
				-0x04 "from" 0x21 0x0b "sender@host"
			
 
				-0x02 "to" 0x21 0x0e "recipient@host"
			
 
				-0x03 "seq" 0x21 0x04 "1234"
			
 
				-0x04 "data" 0x22
			
 
				-  0x04 "list" 0x23 
			
 
				-    0x21 0x01 "1"
			
 
				-    0x21 0x01 "2"
			
 
				-    0x04
			
 
				-    0x21 0x04 "this"
			
 
				-  0x0b "description" 0x0b "Fun for all"
			
 
				-
			
 
				-
			
 
				-MESSAGE ROUTING
			
 
				----------------
			
 
				-
			
 
				-The message routing daemon uses the top-level hash to contain routing
			
 
				-instructions and additional control data.  Not all of these are
			
 
				-required for various control message types; see the individual
			
 
				-descriptions for more information.
			
 
				-
			
 
				-    Tag      Description
			
 
				-    -------  ----------------------------------------
			
 
				-    msg      Sender-supplied data
			
 
				-    from     sender's identity
			
 
				-    group    Group name this message is being sent to
			
 
				-    instance Instance in this group
			
 
				-    repl     if present, this message is a reply.
			
 
				-    seq	     sequence number, used in replies
			
 
				-    to	     recipient or "*" for no specific receiver
			
 
				-    type     "send" for a channel message
			
 
				-
			
 
				-
			
 
				-"type" is a DATA element, which indicates to the message routing
			
 
				-system what the purpose of this message is.
			
 
				+|====================================================================================================
			
 
				+|Name       |type  |Description
			
 
				+|====================================================================================================
			
 
				+|from       |string|Sender's l-name
			
 
				+|type       |string|Type of the message. The routed message is "send".
			
 
				+|group      |string|The group to deliver to.
			
 
				+|instance   |string|Instance in the group. Purpose lost in history. Defaults to "*".
			
 
				+|to         |string|Override recipient (group/instance ignored).
			
 
				+|seq        |int   |Tracking number of the message.
			
 
				+|reply      |int   |If present, contains a seq number of message this is a reply to.
			
 
				+|want_answer|bool  |If present and true, the daemon generates error if there's no matching recipient.
			
 
				+|====================================================================================================
			
 
				 
			
 
				+Types of messages
			
 
				+-----------------
			
 
				 
			
 
				 Get Local Name (type "getlname")
			
 
				---------------------------------
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				 
			
 
				-Upon connection, this is the first message to be sent to the control
			
 
				-daemon.  It will return the local name of this client.  Each
			
 
				-connection gets its own unique local name, and local names are never
			
 
				-repeated.  They should be considered opaque strings, in a format
			
 
				-useful only to the message routing system.  They are used in replies
			
 
				-or to send to a specific destination.
			
 
				+Upon connection, this is the first message to be sent to the daemon.
			
 
				+It will return the local name of this client.  Each connection gets
			
 
				+its own unique local name, and local names are never repeated.  They
			
 
				+should be considered opaque strings, in a format useful only to the
			
 
				+message routing system.  They are used in replies or to send to a
			
 
				+specific destination.
			
 
				 
			
 
				 To request the local name, the only element included is the
			
 
				-  "type" => "getlname"
			
 
				+  {"type": "getlname"}
			
 
				 tuple.  The response is also a simple, single tuple:
			
 
				-  "lname" => "UTF-8 encoded local name blob"
			
 
				+  {"lname" => "Opaque utf-8 string"}
			
 
				 
			
 
				 Until this message is sent, no other types of messages may be sent on
			
 
				 this connection.
			
 
				 
			
 
				-
			
 
				 Regular Group Messages (type "send")
			
 
				-------------------------------------
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				 
			
 
				-When sending a message:
			
 
				+Message routed to other client. This one expects the body to be
			
 
				+non-empty.
			
 
				 
			
 
				-"msg" is the sender supplied data.  It is encoded as per its type.
			
 
				-It is a required field, but may be the NULL type if not needed.
			
 
				-In OpenReg, this was another wire format message, stored as an
			
 
				-ITEM_DATA.  This was done to make it easy to decode the routing
			
 
				-information without having to decode arbitrary application-supplied
			
 
				-data, but rather treat this application data as an opaque blob.
			
 
				+Expected headers are:
			
 
				 
			
 
				-"from" is a DATA element, and its value is a UTF-8 encoded sender
			
 
				-identity.  It MUST be the "local name" supplied by the message
			
 
				-routing system upon connection.  The message routing system will
			
 
				-enforce this, but will not add it.  It is a required field.
			
 
				+* from
			
 
				+* group
			
 
				+* instance (set to "*" if no specific instance desired)
			
 
				+* seq (should be unique for the sender)
			
 
				+* to (set to "*" if not directed to specific client)
			
 
				+* reply (optional, only if it is reply)
			
 
				+* want_answer (optional, only when not a reply)
			
 
				 
			
 
				-"group" is a DATA element, and its value is the UTF-8 encoded group
			
 
				-name this message is being transmitted to.  It is a required field for
			
 
				-all messages of type "send".
			
 
				+A client does not see its own transmissions.
			
 
				 
			
 
				-"instance" is a DATA element, and its value is the UTF-8 encoded
			
 
				-instance name, with "*" meaning all instances.
			
 
				+Group Subscriptions (type "subscribe")
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				 
			
 
				-"repl" is the sequence number being replied to, if this is a reply.
			
 
				+Indicates the sender wants to be included in the given group.
			
 
				 
			
 
				-"seq" is a unique identity per client.  That is, the <lname, seq>
			
 
				-tuple must be unique over the lifetime of the connection, or at least
			
 
				-over the lifetime of the expected reply duration.
			
 
				+Expected headers are:
			
 
				 
			
 
				-"to" is a DATA element, and its value is a UTF-8 encoded recipient
			
 
				-identity.  This must be a specific recipient name or "*" to indicate
			
 
				-"all listeners on this channel."  It is a required field.
			
 
				+* group
			
 
				+* instance (leave at "*" for default)
			
 
				 
			
 
				-When a message of type "send" is received by the client, all the data
			
 
				-is used as above.  This indicates a message of the given type was
			
 
				-received.
			
 
				+There is no response to this message and the client is subscribed to
			
 
				+the given group and instance.
			
 
				 
			
 
				-A client does not see its own transmissions. (XXXMLG Need to check this)
			
 
				+The group can be any utf-8 string and the group doesn't have to exist
			
 
				+before (it is created when at least one client is in it). A client may
			
 
				+be subscribed in multiple groups.
			
 
				 
			
 
				+Group Unsubscribe (type "unsubscribe")
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				 
			
 
				-Group Subscriptions (type "subscribe")
			
 
				---------------------------------------
			
 
				+The headers to be included are "group" and "instance" and have the same
			
 
				+meaning as a "subscribe" message. Only, the client is removed from the
			
 
				+group.
			
 
				 
			
 
				-A subscription requires the "group", "instance", and a flag to
			
 
				-indicate the subscription type ("subtype").  If instance is "*" the
			
 
				-instance name will be ignored when deciding to forward a message to
			
 
				-this client or not.
			
 
				+Transmitted messages
			
 
				+--------------------
			
 
				 
			
 
				-"subtype" is a DATA element, and contains "normal" for normal channel
			
 
				-subscriptions, "meonly" for only those messages on a channel with the
			
 
				-recipient specified exactly as the local name, or "promisc" to receive
			
 
				-all channel messages regardless of other filters.  As its name
			
 
				-implies, "normal" is for typical subscriptions, and "promisc" is
			
 
				-intended for channel message debugging.
			
 
				+These are the messages generally transmitted in the body of the
			
 
				+message.
			
 
				 
			
 
				-There is no response to this message.
			
 
				+Command
			
 
				+~~~~~~~
			
 
				 
			
 
				+It is a command from one process to another, to do something or send
			
 
				+some information. It is identified by a name and can optionally have
			
 
				+parameters. It'd look like this:
			
 
				 
			
 
				-Group Unsubscribe (type "unsubscribe")
			
 
				--------------------------------
			
 
				+  {"command": ["name", <parameters>]}
			
 
				+
			
 
				+The parameters may be omitted (then the array is 1 element long). If
			
 
				+present, it may be any JSON element. However, the most usual is an
			
 
				+object with named parameter values.
			
 
				+
			
 
				+It is usually transmitted with the `want_answer` header turned on to
			
 
				+cope with the situation the remote end doesn't exist, and sent to a
			
 
				+group (eg. `to` with value of `*`).
			
 
				+
			
 
				+Success reply
			
 
				+~~~~~~~~~~~~~
			
 
				+
			
 
				+When the command is successful, the other side answers by a reply of
			
 
				+the following format:
			
 
				+
			
 
				+  {"result": [0, <result>]}
			
 
				+
			
 
				+The result is the return value of the command. It may be any JSON
			
 
				+element and it may be omitted (for the case of ``void'' function).
			
 
				 
			
 
				-The fields to be included are "group" and "instance" and have the same
			
 
				-meaning as a "subscribe" message.
			
 
				+This is transmitted with the `reply` header set to the `seq` number of
			
 
				+the original command. It is sent with the `to` header set.
			
 
				 
			
 
				-There is no response to this message.
			
 
				+Error reply
			
 
				+~~~~~~~~~~~
			
 
				 
			
 
				+In case something goes wrong, an error reply is sent. This is similar
			
 
				+as throwing an exception from local function. The format is similar:
			
 
				 
			
 
				-Statistics (type "stats")
			
 
				--------------------------
			
 
				+  {"result": [ecode, "Error description"]}
			
 
				 
			
 
				-Request statistics from the message router.  No other fields are
			
 
				-inclued in the request.
			
 
				+The `ecode` is non-zero error code. Most of the current code uses `1`
			
 
				+for all errors. The string after that is mandatory and must contain a
			
 
				+human-readable description of the error.
			
 
				 
			
 
				-The response contains a single element "stats" which is an opaque
			
 
				-element.  This is used mostly for debugging, and its format is
			
 
				-specific to the message router.  In general, some method to simply
			
 
				-dump raw messages would produce something useful during debugging.
			
 
				+The negative error codes are reserved for errors from the daemon.
			
 
				+Currently, only `-1` is used and it is generated when a message with
			
 
				+`reply` not included is sent, it has the `want_answer` header set to
			
 
				+`true` and there's no recipient to deliver the message to. This
			
 
				+usually means a command was sent to a non-existent recipient.