
SMPTE ST 2117-1, informally known as VC-6, is a
video coding format
A video coding format (or sometimes video compression format) is a content representation format of digital video content, such as in a data file or bitstream. It typically uses a standardized video compression algorithm, most commonly based on ...
.
Overview
The VC-6
codec
A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder.
In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...
is optimized for intermediate, mezzanine or contribution coding applications.
Typically, these applications involve compressing finished compositions for editing, contribution, primary distribution, archiving and other applications where it is necessary to preserve image quality as close to the original as possible, whilst reducing
bitrates, and optimizing processing, power and storage requirements. VC-6, like other codecs in this category uses only
intra-frame
Intra-frame coding is a data compression technique used within a video frame, enabling smaller file sizes and lower bitrates, with little or no loss in quality. Since neighboring pixels within an image are often very similar, rather than storing ...
compressions, where each frame is stored independently and can be decoded with no dependencies on any other frame. The codec implements
lossless
Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statisti ...
and
lossy
In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
compression, depending on the encoding parameters that have been selected. It was standardized in 2020. Earlier variants of the codec have been deployed by
V-Nova
V-NOVA is a multinational IP and Technology company headquartered in London, UK. It is best known for innovation in data compression technology for video and images. V-Nova has partnered with large organizations including Sky Group, Sky, Xilinx, N ...
since 2015 under the trade name Perseus. The codec is based on hierarchical data structures called s-trees, and does not involve
DCT or
wavelet transform compression. The compression mechanism is independent of the data being compressed, and can be applied to
pixels
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest addressable element in a dot matrix display device. In most digital display devices, pixels are the sma ...
as well as other non-image data.
Unlike
DCT based codecs, VC-6 is based on hierarchical, repeatable s-tree structures that are similar to modified
quadtrees. These simple structures provide intrinsic capabilities, such as massive parallelism and the ability to choose the type of filtering used to reconstruct higher-resolution images from lower-resolution images. In the VC-6 standard
an up-sampler developed with an in-loop
Convolutional Neural Network
A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different ty ...
is provided to optimize the detail in the reconstructed image, without requiring a large computational overhead. The ability to navigate spatially within the VC-6 bitstream at multiple levels
also provides the ability for decoding devices to apply more resources to different regions of the image allowing for
Region-of-Interest applications to operate on compressed bitstreams without requiring a decode of the full-resolution image.
History
At the
NAB Show
NAB Show is an annual trade show produced by the National Association of Broadcasters. It takes place in April, and has been held since 1991 at the Las Vegas Convention Center in Las Vegas, Nevada. The show's tagline is "Where Content Comes to L ...
in 2015,
V-Nova
V-NOVA is a multinational IP and Technology company headquartered in London, UK. It is best known for innovation in data compression technology for video and images. V-Nova has partnered with large organizations including Sky Group, Sky, Xilinx, N ...
claimed "2x–3x average compression gains, at all quality levels, under practical real-time operating scenarios versus
H.264
Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and d ...
,
HEVC
High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding (AVC, H.264, or MPEG-4 Part 10). In co ...
and
JPEG2000
JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi (later the JPEG president), with the intention of superseding their ...
.".
Making this announcement on 1 April before a major trade show attracted the attention of many compression experts. Since then,
V-Nova
V-NOVA is a multinational IP and Technology company headquartered in London, UK. It is best known for innovation in data compression technology for video and images. V-Nova has partnered with large organizations including Sky Group, Sky, Xilinx, N ...
have deployed and licensed the technology, known at the time as Perseus,
in both contribution and distribution applications around the world including
Sky Italia
Sky Italia S.r.l. is an Italian satellite television platform owned by the American media conglomerate Comcast. Sky Italia also broadcasts three national free-to-air television channels: TV8, Cielo, and Sky TG24.
Pay TV services on the Sky ...
, Fast Filmz,
Harmonic Inc
In physics, acoustics, and telecommunications, a harmonic is a sinusoidal wave with a frequency that is a positive integer multiple of the ''fundamental frequency'' of a periodic signal. The fundamental frequency is also called the ''1st harmo ...
, and others. A variant of the technology optimized for enhancing distribution codec will soon be standardized as
MPEG-5 Part-2 LCEVC.
Core concepts
Planes
The standard
describes a compression algorithm that is applied to independent planes of data. These planes might be
RGB
The RGB color model is an additive color model in which the red, green, and blue primary colors of light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three ...
or
RGBA
RGBA stands for red green blue alpha. While it is sometimes described as a color space, it is actually a three-channel RGB color model supplemented with a fourth ''alpha channel''. Alpha indicates how opaque each pixel is and allows an image to ...
pixels originating in a camera,
YCbCr
YCbCr, Y′CbCr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in digital video and digital photography, photography systems. Like YPbPr, YPBPR, it is based on RGB primaries; the two ...
pixels from a conventional TV-centric video source or some other planes of data. There may be up to 255 independent planes of data, and each plane can have a grid of data values of dimensions up to 65535 x 65535. Th
SMPTE ST 2117-1standard focuses on compressing planes of data values, typically pixels. To compress and decompress the data in each plane, VC-6 uses hierarchical representations of small tree-like structure that carry metadata used to predict other trees. There are 3 fundamental structures repeated in each plane.
S-tree
The core compression structure in VC-6 is the s-tree. It is similar to the
quadtree
A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four ...
structure common in other schemes. An s-tree is comprised nodes arranged in a tree structure, where each node links to 4 nodes in the next layer. The total number of layers above the root node is known as the rise of the s-tree. Compression is achieved in an s-tree by using metadata to signal whether levels can be predicted with selective carrying of enhancement data in the bitstream. The more data that can be predicted, the less information that is sent, and the better the
compression ratio
The compression ratio is the ratio between the maximum and minimum volume during the compression stage of the power cycle in a piston or Wankel engine.
A fundamental specification for such engines, it can be measured in two different ways. Th ...
.
Tableau
The standard
defines a tableau as the root node, or the highest layer of an s-tree, that contains nodes for another s-tree. Like the generic s-trees from which they are constructed, tableaux are arranged in layers with metadata in the nodes indicating whether or not higher layers are predicted or transmitted in the bitstream.
Echelon
The hierarchical s-tree and tableau structures in the standard
are used to carry enhancements (called resid-vals) and other metadata to reduce the amount of raw data that needs to be carried in the bitstream payload. The final hierarchical tool is an ability to arrange the tableaux, so that data from each plane (i.e. pixels) can be dequantized at different resolutions and used as predictors for higher resolutions. Each of these resolutions is defined by the standard
as an echelon. Each echelon within a plane is identified by an index, where a more negative index indicates a low resolution and a larger, more positive index indicates a higher resolution.
Bitstream overview
VC-6 is an example of
intra-frame coding
Intra-frame coding is a data compression technique used within a video frame, enabling smaller file sizes and lower bitrates, with little or no loss in quality. Since neighboring pixels within an image are often very similar, rather than storing ...
, where each picture is coded without referencing other pictures. It is also intra-plane, where no information from one plane is used to predict another plane. As a result, the VC-6 bitstream contains all of the information for all of the planes of a single image.
An image sequence is created by concatenating the bitstreams for multiple images, or by packaging them in a container such as
MXF or
Quicktime
QuickTime (or QuickTime Player) is an extensible multimedia architecture created by Apple, which supports playing, streaming, encoding, and transcoding a variety of digital media formats. The term ''QuickTime'' also refers to the QuickTime Pla ...
or
Matroska
Matroska (styled Matroška) is a project to create a container format that can hold an unlimited number of video, audio, picture, or subtitle tracks in one file. The Matroska Multimedia Container is similar in concept to other containers like ...
.
The VC-6 bitstream is defined in the standard.
by pseudo code, and a reference decoder has been demonstrated based on that definition. The primary header is the only fixed structure defined by the standard.
The secondary header contains marker and sizing information depending on the values in the primary header. The tertiary header is entirely calculated, and then the payload structure is derived from the parameters calculated during header decoding
Decoding overview
The standard
defines a process called plane reconstruction for decoding images from a bitstream. The process starts with the echelon having the lowest index. No predictions are used for this echelon. Firstly, the bitstream rules are used to reconstruct residuals. Next, desparsification and
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
decoding processes are performed to fill the grid with data values at each coordinate. These values are then dequantised to create full-range values that can be used as predictions for the echelon with the next highest index. Each echelon uses the upsampler specified in the header to create a predicted plane from the echelon below which is added to the residual grid from the current echelon that can be upsampled as a prediction for the next echelon.
The final, full-resolution, echelon, defined by the standard, is at index 0, and its results are displayed, rather than used for another echelon.
Upsampler options
Basic options
The standard
defines a number of basic upsamplers
to create higher-resolution predictions from lower-resolution echelons. There are two linear upsamplers, bicubic and sharp, and a nearest-neighbour upsampler.
Convolutional Neural Network Upsampler
Six different non-linear upsamplers are defined
by a set of processes and coefficients that are provided in
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
format.
These coefficients were generated using Convolutional Neural Network
techniques.
References
{{Compression formats
High-definition television
SMPTE standards
Video codecs
HD DVD
Open standards covered by patents
Video compression
Lossless compression algorithms