Cap’n Proto is a
data serialization
In computing, serialization (or serialisation, also referred to as pickling in Python (programming language), Python) is the process of translating a data structure or object (computer science), object state into a format that can be stored (e. ...
format and
Remote Procedure Call
In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space (commonly on another computer on a shared computer network), which is written as if it were a ...
(RPC) framework for exchanging data between computer programs. The high-level design focuses on speed and security, making it suitable for network as well as inter-process communication. Cap'n Proto was created by the former maintainer of Google's popular
Protocol Buffers
Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an ...
framework (Kenton Varda) and was designed to avoid some of its perceived shortcomings.
Technical overview
IDL Schema
Like most
RPC
RPC may refer to:
Science and technology
* Rational polynomial coefficient
* Reactive Plastic Curtain, a carbon-dioxide-absorbing device used in some rebreather breathing sets
* Regional Playback Control, a regional lockout technology for DVDs ...
frameworks dating as far back as
Sun RPC and
OSF DCE RPC (and their object-based descendants
CORBA
The Common Object Request Broker Architecture (CORBA) is a standard defined by the Object Management Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collaboration between sy ...
and
DCOM), Cap'n Proto uses an
Interface Description Language (IDL) to generate RPC libraries in a variety of programming languages - automating many low level details such as handling network requests, converting between data types, etc. The Cap'n Proto interface schema uses a
C-like syntax and supports common
primitives data types (booleans, integers, floats, etc.),
compound types (structs, lists, enums), as well as
generics and
dynamic type
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a ''type'' (for example, integer, floating point, string) to every '' term'' (a word, phrase, or other set of symbols). Usuall ...
s. Cap'n Proto also supports object-oriented features such as multiple inheritance, which has been criticized for its complexity.
@0xa558ef006c0c123; # Unique identifiers are manually or automatically assigned to files and compound types
struct Date @0x5c5a558ef006c0c1
struct Contact @0xf032a54bcb3667e0
Values in Cap'n Proto messages are represented in
binary
Binary may refer to:
Science and technology Mathematics
* Binary number, a representation of numbers using only two values (0 and 1) for each digit
* Binary function, a function that takes two arguments
* Binary operation, a mathematical op ...
, as opposed to text encoding used by "
human-readable
In computing, a human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans, resulting in human-readable data. It is often encoded as ASCII or Unicode text, rather than as binary da ...
" formats such as
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
or
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
. Cap'n Proto tries to make the storage/network protocol appropriate as an in-memory format, so that no translation step is needed when reading data into memory or writing data out of memory.
[Unlike Apache Arrow, Cap'n Proto's in-memory values ar]
not suited for sharing mutable data
/ref> For example, the representation of numbers (endianness
file:Gullivers_travels.jpg, ''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined
In computing, endianness is the order in which bytes within a word (data type), word of digital data are transmitted over a data comm ...
) was chosen to match the representation the most popular CPU architectures. When the in-memory and wire-protocol representations match, Cap'n Proto can avoid copying and encoding data when creating or reading a message and instead point to the location of the value in memory. Cap'n Proto also supports random access to data, meaning that any field can be read without having to read the entire message.
Unlike other binary serialization protocols such as XMI, Cap'n Proto considers fine-grained data validation
In computing, data validation or input validation is the process of ensuring data has undergone data cleansing to confirm it has data quality, that is, that it is both correct and useful. It uses routines, often called "validation rules", "valida ...
at the RPC level an anti-feature that limits a protocol's ability to evolve. This was informed by experiences at Google where simply changing a field from ''mandatory'' to ''optional'' would cause complex operational failures.[Marking a field as required was removed fro]
Protocol Buffers 3
Cap'n Proto schemas are designed to be flexible as possible and pushes data validation to the application level, allowing arbitrary renaming of fields, adding new fields, and making concrete types generic. Cap'n Proto does, however, validate pointer bounds and type check individual values when they are first accessed.
Enforcing complex schema constraints would also incur significant overhead,[''Assuming the data has already been allocated'' (e.g. in network buffers, read from disk) access becomes ]O(1)
Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by German mathematicians Pau ...
. Additional serialization/deserialization steps (as required to inspect values) would limit performance to O(n). negating the benefits of reusing in-memory data structures and preventing random access to data. Cap'n Proto protocol is ''theoretically'' suitable for very fast inter-process communication
In computer science, interprocess communication (IPC) is the sharing of data between running Process (computing), processes in a computer system. Mechanisms for IPC may be provided by an operating system. Applications which use IPC are often cat ...
(IPC) via immutable shared memory, but as of October 2020 none of the implementations support data passing via shared memory. However, Cap'n Proto is still generally considered faster than Protocol Buffers and similar RPC libraries.
Networking
Cap'n Proto RPC is network aware: supporting both handling of disconnects and promise pipelining, wherein a server pipes the output of one function into another function. This saves a client a round trip per successive call to the server without having to provide a dedicated API for every possible call graph. Cap'n Proto can be layered on top of TLS and support for the Noise Protocol Framework is on the roadmap. Cap'n Proto RPC is transport agnostic, with the mainline implementation supporting WebSockets, HTTP, TCP, and UDP.
Capability security
The Cap'n Proto RPC standard has a rich capability security model based on the CapTP protocol used by the E programming language
E is an object-oriented programming language for secure distributed computing, created by Mark S. Miller, Dan Bornstein, Douglas Crockford, Chip Morningstar and others at Electric Communities in 1997. E is mainly descended from the concurre ...
.
As of October 2020, the reference implementation only supports level 2.
Comparison to other serialization formats
Cap'n Proto is often compared to other zero-copy serialization formats, such as Google's FlatBuffers and Simple Binary Encoding (SBE).
Adoption
Cap'n Proto was originally created for Sandstorm.io, a startup offering a web application hosting platform with capability-based security. After Sandstorm.io failed commercially, the development team was acqui-hired by Cloudflare, which uses Cap'n Proto internally.
Notes
References
{{DEFAULTSORT:Capn Proto
Data serialization formats
Remote procedure call
Inter-process communication