JSON
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
streaming comprises
communications protocols to
delimit JSON
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
objects built upon lower-level stream-oriented protocols (such as
TCP
TCP may refer to:
Science and technology
* Transformer coupled plasma
* Tool Center Point, see Robot end effector
Computing
* Transmission Control Protocol, a fundamental Internet standard
* Telephony control protocol, a Bluetooth communication s ...
), that ensures individual JSON objects are recognized, when the server and clients use the same one (e.g. implicitly coded in). This is necessary as JSON is a non-concatenative protocol (the concatenation of two JSON objects does not produce a valid JSON object).
Introduction
JSON is a popular format for exchanging object data between systems. Frequently there's a need for a stream of objects to be sent over a single connection, such as a
stock ticker
Ticker tape was the earliest electrical dedicated financial communications medium, transmitting stock price information over telegraph lines, in use from around 1870 through 1970. It consisted of a paper strip that ran through a machine called ...
or
application log records. In these cases there's a need to identify where one JSON encoded object ends and the next begins. Technically this is known as
framing.
There are four common ways to achieve this:
* Send the JSON objects ''formatted without newlines'' and use a newline as the delimiter.
* Send the JSON objects concatenated with a
record separator control character as the delimiter.
* Send the JSON objects concatenated with no delimiters and rely on a streaming parser to extract them.
* Send the JSON objects prefixed with their length and rely on a streaming parser to extract them.
Comparison
Line-delimited JSON works very well with traditional
line-oriented tools.
Concatenated JSON works with pretty-printed JSON but requires more effort and complexity to parse. It doesn't work well with traditional line-oriented tools. Concatenated JSON streaming is a superset of line-delimited JSON streaming.
Length-prefixed JSON works with pretty-printed JSON. It doesn't work well with traditional line-oriented tools, but may offer performance advantages over line-delimited or concatenated streaming. It can also be simpler to parse.
Newline-Delimited JSON
Two terms for equivalent formats of line-delimited JSON are:
*
Newline
Newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a ...
delimited (NDJSON) - The old name was Line delimited JSON (LDJSON).
* JSON lines (JSONL)
Streaming makes use of the fact that the JSON format does not allow return and newline characters within primitive values (in strings those must be
escaped as
\r
and
\n
, respectively) and that most JSON formatters default to not including any whitespace, including returns and newlines. These features allow the newline character or return and newline character sequence to be used as a delimiter.
This example shows two JSON objects (the implicit newline characters at the end of each line are not shown):
The use of a newline as a delimiter enables this format to work very well with traditional
line-oriented Unix tools.
A log file, for example, might look like:
which is very easy to sort by date,
grep for usernames, actions, IP addresses, etc.
Compatibility
Line-delimited JSON can be read by a parser that can handle concatenated JSON. Concatenated JSON that contains newlines ''within'' a JSON object can't be read by a line-delimited JSON parser.
The terms "line-delimited JSON" and "newline-delimited JSON" are often used without clarifying if embedded newlines are supported.
In the past the NDJ specification ("newline-delimited JSON") allowed comments to be embedded if the first two characters of a given line were "//". This could not be used with standard JSON parsers if comments were included. Current version of the specification ("NDJSON - Newline delimited JSON
") no longer includes comments.
Concatenated JSON can be converted into line-delimited JSON by a suitable JSON utility such as jq. For example
jq --compact-output . < concatenated.json > lines.json
Record separator-delimited JSON
Record separator-delimited JSON streaming allows JSON text sequences to be delimited without the requirement that the JSON formatter exclude whitespace. Since JSON text sequences cannot contain control characters, a
record separator
The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, ...
character can be used to delimit the sequences. In addition, it is suggested that each JSON text sequence be followed by a
line feed character to allow proper handling of top-level JSON objects that are not self delimiting (numbers, true, false, and null).
This format is also known as JSON Text Sequences or
MIME type
A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority, Internet Assigned Numbers Authority (IANA) is the official authority for t ...
application/json-seq
, and is formally described in IETF RFC 7464.
The example below shows two JSON objects with ␞ representing the record separator control character and ␊ representing the line feed character:
␞␊
␞␊
Concatenated JSON
Concatenated JSON streaming allows the sender to simply write each JSON object into the stream with no delimiters. It relies on the receiver using a
parser that can recognize and ''emit'' each JSON object as the terminating character is parsed. Concatenated JSON isn't a new format, it's simply a name for streaming multiple JSON objects without any delimiters.
The advantage of this format is that it can handle JSON objects that have been formatted with embedded newline characters, e.g.,
pretty-printed for human readability. For example, these two inputs are both valid and produce the same output:
Implementations that rely on line-based input may require a newline character after each JSON object in order for the object to be emitted by the parser in a timely manner. (Otherwise the line may remain in the input buffer without being passed to the parser.) This is rarely recognised as an issue because terminating JSON objects with a newline character is very common.
Length-prefixed JSON
Length-prefixed or framed JSON streaming allows the sender to explicitly state the length of each message. It relies on the receiver using a parser that can recognize each length ''n'' and then read the following ''n'' bytes to parse as JSON.
The advantage of this format is that it can speed up parsing due to the fact that the exact length of each message is explicitly stated, rather than forcing the parser to search for delimiters. Length-prefixed JSON is also well-suited for TCP applications, where a single "message" may be divided into arbitrary chunks, because the prefixed length tells the parser exactly how many bytes to expect before attempting to parse a JSON string.
This example shows two length-prefixed JSON objects (with each length being the byte-length of the following JSON string):
1855
Applications and tools
Line-delimited JSON
*
jqcan both create and read line-delimited JSON texts.
*
Jackson (API) can read and write line-delimited JSON texts.
*
logstash includes
json_lines codec
ldjson-streammodule for
Node.js
Node.js is an open-source server environment. Node.js is cross-platform and runs on Windows, Linux, Unix, and macOS. Node.js is a back-end JavaScript runtime environment. Node.js runs on the V8 JavaScript Engine and executes JavaScript code ou ...
ld-jsonstreamdependency free module for
Node.js
Node.js is an open-source server environment. Node.js is cross-platform and runs on Windows, Linux, Unix, and macOS. Node.js is a back-end JavaScript runtime environment. Node.js runs on the V8 JavaScript Engine and executes JavaScript code ou ...
that is compliant with th
''NDJSON – Newline-delimited JSON Specification''ref name=spec>{{cite web, title=NDJSON – Newline-delimited JSON, A standard for delimiting JSON in stream protocols, url=http://specs.okfnlabs.org/ndjson/index.html
ArduinoJsonis a C++ library that supports line-delimited JSON.
RecordStreamA set of tools to manipulate line delimited JSON (generate, transform, collect statistics, and format results).
JSON Golang LibraryA library for
Go_(programming_language) to read and write JSONL
Record separator-delimited JSON
*
jqcan both create and read record separator-delimited JSON texts.
Concatenated JSON
concatjsonconcatenated JSON streaming parser/serializer module for
Node.js
Node.js is an open-source server environment. Node.js is cross-platform and runs on Windows, Linux, Unix, and macOS. Node.js is a back-end JavaScript runtime environment. Node.js runs on the V8 JavaScript Engine and executes JavaScript code ou ...
*
Jackson_(API) can read and write concatenated JSON content.
jqlightweight flexible command-line JSON processor
NoggitSolr's streaming JSON parser for Java
Yajl– Yet Another JSON Library. YAJL is a small event-driven (SAX-style) JSON parser written in ANSI C, and a small validating JSON generator.
ArduinoJsonis a C++ library that supports concatenated JSON.
GSONJsonStreamParser.java can read concatenated JSON.
json-streamis a streaming JSON parser for python.
Length-prefixed JSON
missiveFast, lightweight library for encoding and decoding length-prefixed JSON messages over streams
Native messagingWebExtensions Native Messaging
References
JSON
Data serialization formats