protocol version 0x536b616e DATA 0x01 HASH 0x02 LIST 0x03 NULL 0x04 TYPE_MASK 0x0f LENGTH_32 0x00 LENGTH_16 0x10 LENGTH_8 0x20 LENGTH_MASK 0xf0 MESSAGE ENCODING ---------------- When decoding, the entire message length must be known. If this is transmitted over a raw stream such as TCP, this is usually encoded with a 4-byte length followed by the message itself. If some other wrapping is used (say as part of a different message structure) the length of the message must be preserved and included for decoding. The first 4 bytes of the message is the protocol version encoded directly as a 4-byte value. Immediately following this is a HASH element. The length of the hash element is the remainder of the message after subtracting 4 bytes for the protocol version. This initial HASH is intended to be used by the message routing system if one is in use. ITEM TYPES ---------- There are four basic types encoded in this protocol. A simple data blob (DATA), a tag-value series (HASH), an ordered list (LIST), and a NULL type (which is used internally to encode DATA types which are empty and can be used to indicate existance without data in a hash.) Each item can be of any type, so a hash of hashes and hashes of lists are typical. All multi-byte integers which are encoded in binary are in network byte order. ITEM ENCODING ------------- Each item is preceeded by a single byte which describes that item. This byte contains the item type and item length encoding: Thing Length Description ---------------- -------- ------------------------------------ TyLen 1 byte Item type and length encoding Length variable Item data blob length Item Data variable Item data blob The TyLen field includes both the item data type and the item's length. The length bytes are encoded depending on the length of data portion, and the smallest data encoding type supported should be used. Note that this length compression is used just for data compactness. It is wasteful to encode the most common length (8-bit length) as 4 bytes, so this method allows one byte to be used rather than 4, three of which are nearly always zero. HASH ---- This is a tag/value pair where each tag is an opaque unique blob and the data elements are of any type. Hashes are not encoded in any specific tag or item order. The length of the HASH's data area is processed for tag/value pairs until the entire area is consumed. Running out of data prematurely indicates an incorrectly encoded message. The data area consists of repeated items: Thing Length Description ---------------- -------- ------------------------------------ Tag Length 1 byte The length of the tag. Tag Variable The tag name Item Variable Encoded item The Tag Length field is always one byte, which limits the tag name to 255 bytes maximum. A tag length of zero is invalid. LIST ---- A LIST is a list of items encoded and decoded in a specific order. The order is chosen entirely by the source curing encoding. The length of the LIST's data is consumed by the ITEMs it contains. Running out of room prematurely indicates an incorrectly encoded message. The data area consists of repeated items: Thing Length Description -------------- ------ ---------------------------------------- Item Variable Encoded item DATA ---- A DATA item is a simple blob of data. No further processing of this data is performed by this protocol on these elements. The data blob is the entire data area. The data area can be 0 or more bytes long. It is typical to encode integers as strings rather than binary integers. However, so long as both sender and recipient agree on the format of the data blob itself, any blob encoding may be used. NULL ---- This data element indicates no data is actually present. This can be used to indicate that a tag is present in a HASH but no data is actually at that location, or in a LIST to indicate empty item positions. There is no data portion of this type, and the encoded length is ignored and is always zero. Note that this is different than a DATA element with a zero length. EXAMPLE ------- This is Ruby syntax, but should be clear enough for anyone to read. Example data encoding: { "from" => "sender@host", "to" => "recipient@host", "seq" => 1234, "data" => { "list" => [ 1, 2, nil, "this" ], "description" => "Fun for all", }, } Wire-format: In this format, strings are not shown in hex, but are included "like this." Descriptions are written (like this.) Message Length: 0x64 (100 bytes) Protocol Version: 0x53 0x6b 0x61 0x6e (remaining length: 96 bytes) 0x04 "from" 0x21 0x0b "sender@host" 0x02 "to" 0x21 0x0e "recipient@host" 0x03 "seq" 0x21 0x04 "1234" 0x04 "data" 0x22 0x04 "list" 0x23 0x21 0x01 "1" 0x21 0x01 "2" 0x04 0x21 0x04 "this" 0x0b "description" 0x0b "Fun for all" MESSAGE ROUTING --------------- The message routing daemon uses the top-level hash to contain routing instructions and additional control data. Not all of these are required for various control message types; see the individual descriptions for more information. Tag Description ------- ---------------------------------------- msg Sender-supplied data from sender's identity group Group name this message is being sent to instance Instance in this group repl if present, this message is a reply. seq sequence number, used in replies to recipient or "*" for no specific receiver type "send" for a channel message "type" is a DATA element, which indicates to the message routing system what the purpose of this message is. Get Local Name (type "getlname") -------------------------------- Upon connection, this is the first message to be sent to the control daemon. It will return the local name of this client. Each connection gets its own unique local name, and local names are never repeated. They should be considered opaque strings, in a format useful only to the message routing system. They are used in replies or to send to a specific destination. To request the local name, the only element included is the "type" => "getlname" tuple. The response is also a simple, single tuple: "lname" => "UTF-8 encoded local name blob" Until this message is sent, no other types of messages may be sent on this connection. Regular Group Messages (type "send") ------------------------------------ When sending a message: "msg" is the sender supplied data. It is encoded as per its type. It is a required field, but may be the NULL type if not needed. In OpenReg, this was another wire format message, stored as an ITEM_DATA. This was done to make it easy to decode the routing information without having to decode arbitrary application-supplied data, but rather treat this application data as an opaque blob. "from" is a DATA element, and its value is a UTF-8 encoded sender identity. It MUST be the "local name" supplied by the message routing system upon connection. The message routing system will enforce this, but will not add it. It is a required field. "group" is a DATA element, and its value is the UTF-8 encoded group name this message is being transmitted to. It is a required field for all messages of type "send". "instance" is a DATA element, and its value is the UTF-8 encoded instance name, with "*" meaning all instances. "repl" is the sequence number being replied to, if this is a reply. "seq" is a unique identity per client. That is, the tuple must be unique over the lifetime of the connection, or at least over the lifetime of the expected reply duration. "to" is a DATA element, and its value is a UTF-8 encoded recipient identity. This must be a specific recipient name or "*" to indicate "all listeners on this channel." It is a required field. When a message of type "send" is received by the client, all the data is used as above. This indicates a message of the given type was received. A client does not see its own transmissions. (XXXMLG Need to check this) Group Subscriptions (type "subscribe") -------------------------------------- A subscription requires the "group", "instance", and a flag to indicate the subscription type ("sybtype"). If instance is "*" the instance name will be ignored when decising to forward a message to this client or not. "subtype" is a DATA element, and contains "normal" for normal channel subscriptions, "meonly" for only those messages on a channel with the recipient specified exactly as the local name, or "promisc" to receive all channel messages regardless of other filters. As its name implies, "normal" is for typical subscriptions, and "promisc" is intended for channel message debugging. There is no response to this message. Group Unsubscribe (type "unsubscribe") ------------------------------- The fields to be included are "group" and "instance" and have the same meaning as a "subscribe" message. There is no response to this message. Statistics (type "stats") ------------------------- Request statistics from the message router. No other fields are inclued in the request. The response contains a single element "stats" which is an opaque element. This is used mostly for debugging, and its format is specific to the message router. In general, some method to simply dump raw messages would produce something useful during debugging.