HOME

TheInfoList



OR:

CRI ADX is a
lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
proprietary audio storage and compression format developed by
CRI Middleware (formerly CSK Research Institute Corp.) is a Japanese developer providing middleware for use in the video game industry. From the early nineties, CRI was a video game developer, but shifted focus in 2001. History CRI started out as CSK Research I ...
specifically for use in
video games Video games, also known as computer games, are electronic games that involves interaction with a user interface or input device such as a joystick, controller, keyboard, or motion sensing device to generate visual feedback. This feedb ...
; it is derived from
ADPCM Adaptive differential pulse-code modulation (ADPCM) is a variant of differential pulse-code modulation (DPCM) that varies the size of the quantization step, to allow further reduction of the required data bandwidth for a given signal-to-noise rati ...
. Its most notable feature is a looping function that has proved useful for background sounds in various games that have adopted the format, including many games for the Sega Dreamcast as well as some PlayStation 2,
GameCube The is a home video game console developed and released by Nintendo in Japan on September 14, 2001, in North America on November 18, 2001, and in PAL territories in 2002. It is the successor to the Nintendo 64 (1996), and predecessor of the W ...
and Wii games. One of the first games to use ADX was '' Burning Rangers'', on the
Sega Saturn The is a home video game console developed by Sega and released on November 22, 1994, in Japan, May 11, 1995, in North America, and July 8, 1995, in Europe. Part of the fifth generation of video game consoles, it was the successor to the succ ...
. Notably, the ''Sonic the Hedgehog'' series from the Dreamcast generation up to at least ''
Shadow the Hedgehog is a fictional character appearing in Sega's ''Sonic the Hedgehog'' franchise. Shadow was created by Takashi Iizuka and Shiro Maekawa, and first made his debut in '' Sonic Adventure 2'' (2001). Although this was intended to be his only appea ...
'' have used this format for sound and voice recordings.
Jet Set Radio Future is a 2002 action game developed by Smilebit and published by Sega for the Xbox; it is a sequel to the Dreamcast game '' Jet Set Radio'' (2000). As a re-imagining of the original game, it features improved gameplay mechanics, updated graphics, lar ...
for original
Xbox Xbox is a video gaming brand created and owned by Microsoft. The brand consists of five video game consoles, as well as applications (games), streaming services, an online service by the name of Xbox network, and the development arm by th ...
also used this format. On top of the main ADPCM encoding, the ADX toolkit also includes a sibling format, AHX, which uses a variant of
MPEG-2 MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, w ...
audio intended specifically for voice recordings and a packaging archive, AFS, for bundling multiple CRI ADX and AHX tracks into a single container file. Version 2 of the format (ADX2) uses the HCA and HCA-MX extension, which are usually bundled into a container file with the extensions ACB and AWB. The AWB extension is not to be confused with the Audio format with the same extension and mostly contains the binary data for the HCA files.


General overview

CRI ADX is a compressed audio format, but unlike
MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Orig ...
and similar formats, it doesn't apply a psychoacoustic model to the sound to reduce its complexity. The ADPCM model instead stores samples by recording the ''error'' relative to a prediction function which means more of the original signal survives the encoding process; as such ADPCM compression instead trades accuracy of the representation for size by using relatively small sample sizes, usually 4bits. The human auditory system's tolerance for the noise this causes makes the loss of accuracy barely noticeable. Like other encoding formats, CRI ADX supports multiple sampling frequencies such as 22050 Hz, 44100 Hz, 48000 Hz, etc. however, the output sample depth is locked at 16bits, generally due to the lack of precision already mentioned. It supports multiple channels but there seems to be an implicit limitation of stereo (2 channel) audio although the file format itself can represent up to 255 channels. The only particularly distinctive feature that sets CRI ADX apart from alternatives like IMA ADPCM (other than having a different prediction function) is the integrated looping functionality, this enables an audio player to optionally skip backwards after reaching a single specified point in the track to create a coherent loop; hypothetically, this functionality could be used to skip forwards as well but that would be redundant since the audio could simply be clipped with an editing program instead. For playback there are a few plugins for WinAmp and a convert to wave tool (see the references section). The open source program / library
FFmpeg FFmpeg is a free and open-source software project consisting of a suite of libraries and programs for handling video, audio, and other multimedia files and streams. At its core is the command-line ffmpeg tool itself, designed for processing of ...
also has CRI ADX support implemented, however, its decoder is hard coded so can only properly decode 44100 Hz ADXs.


Technical description

The CRI ADX specification is not freely available, however the most important elements of the structure have been reverse engineered and documented in various places on the web. The information here may be incomplete but should be sufficient to build a working
codec A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or ...
or transcoder. As a side note, the AFS archive files that CRI ADXs are sometimes packed in are a simple variant of a tarball which uses numerical indices to identify the contents rather than names. Source code for an extractor can be found in the ADX archive at.


File header

The ADX disk format is defined in
big-endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
. The identified sections of the main header are outlined below: Fields labelled "Unknown" contain either unknown data or are apparently just reserved (i.e. filled with null bytes). Fields labelled with 'v3' or 'v4' but not both are considered "Unknown" in the version they are not marked with. This header may be as short as 20 bytes (0x14), as determined by the copyright offset, which implicitly removes support for a loop since those fields are not present. The "Encoding Type" field should contain one of: * 0x02 for CRI ADX with pre-set prediction coefficients * 0x03 for Standard CRI ADX * 0x04 for CRI ADX with an exponential scale * 0x10 or 0x11 for AHX The "Version" field should contain one of: * 0x03 for CRI ADX 'version 3' * 0x04 for CRI ADX 'version 4' * 0x05 for a variant of CRI ADX 4 without looping support When decoding AHX audio, the version field does not appear to have any meaning and can be safely ignored. Files with encoding type '2' use 4 possible sets of prediction coefficients as listed below:


Sample format

CRI ADX encoded audio data is broken into a series of 'blocks', each containing data for only one channel. The blocks are then laid out in 'frames' which consist of one block from every channel in ascending order. For example, in a stereo (2 channel) stream this would consist of Frame 1: left channel block, right channel block; Frame 2: left, right; etc. Blocks are usually always 18 bytes in size containing 4bit samples though other sizes are technically possible, an example of such a block looks like this: The first 3 bits of the 'Predictor/Scale' integer contain the predictor index. The scale is contained in the remaining 13 bits. The predictor index is a 3bit integer that specifies which prediction coefficient set should be used to decode that block. This is only used in files with encoding type '2'. The scale is a 13bit
unsigned Unsigned can refer to: * An unsigned artist is a musical artist or group not attached or signed to a record label ** Unsigned Music Awards, ceremony noting achievements of unsigned artists ** Unsigned band web, online community * Similarly, the ...
integer (
big-endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
like the header) which is essentially the amplification of all the samples in that block. Each sample in the block must be decoded in bit-stream order, that is, most significant bit first. For example, when the sample size is 4bits: The samples themselves are not in reverse so there is no need to fiddle with them once they are extracted. Each sample is signed so for this example, the value can range between -8 and +7 (which will be multiplied by the scale during decoding). As an aside, although any bit-depth between 1 and 255 is made possible by the header. It is unlikely that one bit samples would ever occur as they can only represent the values , or , all of which are not particularly useful for encoding songs — if they were to occur then it is unclear which of the three possibilities is the correct interpretation.


CRI ADX decoding

This section walks through decoding CRI ADX 'version 3' or 'version 4' when "Encoding Type" is "Standard CRI ADX" (0x03). An encoder can also be built by simply flipping the code to run in reverse. All code samples in this section are written using C99. Before a 'standard' CRI ADX can be either encoded or decoded, the set of prediction coefficients must be calculated. This is generally best done in the initialisation stage: #define M_PI acos(-1.0) double a, b, c; a = sqrt(2.0) - cos(2.0 * M_PI * ((double)adx_header->highpass_frequency / adx_header->sample_rate)); b = sqrt(2.0) - 1.0; c = (a - sqrt((a + b) * (a - b))) / b; //(a+b)*(a-b) = a*a-b*b, however the simpler formula loses accuracy in floating point // double coefficient coefficient = c * 2.0; coefficient = -(c * c); This code calculates prediction coefficients for predicting the current sample from the 2 previous samples. These coefficients also form a first order
Finite Impulse Response In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response (or response to any finite length input) is of ''finite'' duration, because it settles to zero in finite time. This is in contrast to infinite impulse ...
high-pass filter A high-pass filter (HPF) is an electronic filter that passes signals with a frequency higher than a certain cutoff frequency and attenuates signals with frequencies lower than the cutoff frequency. The amount of attenuation for each frequenc ...
as well. Once it knows the decoding coefficients it can start decoding the stream: static int32_t* past_samples; // Previously decoded samples from each channel, zeroed at start (size = 2*channel_count) static uint_fast32_t sample_index = 0; // sample_index is the index of sample set that needs to be decoded next static ADX_header* adx_header; // buffer is where the decoded samples will be put // samples_needed states how many sample 'sets' (one sample from every channel) need to be decoded to fill the buffer // looping_enabled is a boolean flag to control use of the built-in loop // Returns the number of sample 'sets' in the buffer that could not be filled (EOS) unsigned decode_adx_standard( int16_t* buffer, unsigned samples_needed, bool looping_enabled ) Most of the above code should be straightforward enough for anyone versed in C. The 'ADX_header' pointer refers to the data extracted from the header as outlined earlier, it is assumed to have already been converted to the host Endian. This implementation is not intended to be optimal and the external concerns have been ignored such as the specific method for sign extension and the method of acquiring a bitstream from a file or network source. Once it completes, there will be ''samples_needed'' sets (if stereo, there will be pairs for example) of samples in the output ''buffer''. The decoded samples will be in host-endian standard interleaved
PCM Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amp ...
format, i.e. left 16bit, right 16bit, left, right, etc. Finally, if looping is not enabled, or not supported, then the function will return the number of sample spaces that were not used in the buffer. The caller can test if this value is not zero to detect the end of the stream and drop or write silence into the unused spaces if necessary.


Encryption

CRI ADX supports a simple encryption scheme which
XOR Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false). It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , , ...
s values from a linear congruential pseudorandom number generator with the block scale values. This method is computationally inexpensive to decrypt (in keeping with CRI ADX's real-time decoding) yet renders the encrypted files unusable. The encryption is active when the "Flags" value in the header is 0x08. As XOR is symmetric the same method is used to decrypt as to encrypt. The encryption key is a set of three 16-bit values: the multiplier, increment, and start values for the linear congruential generator (the modulus is 0x8000 to keep the values in the 15-bit range of valid block scales). Typically all ADX files from a single game will use the same key. The encryption method is vulnerable to
known-plaintext attack The known-plaintext attack (KPA) is an attack model for cryptanalysis where the attacker has access to both the plaintext (called a crib), and its encrypted version ( ciphertext). These can be used to reveal further secret information such as s ...
s. If an unencrypted version of the same audio is known the random number stream can be easily retrieved and from it the key parameters can be determined, rendering every CRI ADX encrypted with that same key decryptable. The encryption method attempts to make this more difficult by not encrypting silent blocks (with all sample nybbles equal to 0), as their scale is known to be 0. Even if the encrypted CRI ADX is the only sample available, it is possible to determine a key by assuming that the scale values of the decrypted CRI ADX must fall within a "low range". This method does not necessarily find the key used to encrypt the file, however. While it can always determine keys that produce an apparently correct output, errors may exist undetected. This is due to the increasingly random distribution of the lower bits of the scale values, which becomes impossible to separate from the randomness added by the encryption.


AHX decoding

As noted earlier, AHX is just an implementation of MPEG2 audio and the decoding method is basically the same as the standard, it is possible just to demux the stream from the CRI ADX container and feed it through a standard MPEG Audio decoder like mpg123. The CRI ADX header's "sample rate" and "total samples" are usually correct if a decoder needs them (so should be set by encoder/muxer implementations) but most of the other fields such as the "block size" and "sample bitdepth" will usually be zero — as noted above, the looping functionality is also unavailable.


References


External links


ADX product page
a
CRI Middleware website




( ( 2009-10-24)
CRI ADX Description from multimedia.cx Wiki

ADX technical description on vgmstream Wiki
{{DEFAULTSORT:Adx (File Format) Audio codecs Lossy compression algorithms Computer file formats Articles with example C code