Audio-to-video synchronization (AV synchronization, also known as
lip sync
Lip sync or lip synch (pronounced , like the word ''sink'', despite the Hard and soft C, spelling of the participial forms ''synced'' and ''syncing''), short for lip synchronization, is a technical term for matching a Speech, speaking or singin ...
, or by the lack of it: lip-sync error, lip flap) refers to the relative timing of
audio
Audio most commonly refers to sound, as it is transmitted in signal form. It may also refer to:
Sound
*Audio signal, an electrical representation of sound
*Audio frequency, a frequency in the audio spectrum
*Digital audio, representation of sound ...
(sound) and
video
Video is an Electronics, electronic medium for the recording, copying, playback, broadcasting, and display of moving picture, moving image, visual Media (communication), media. Video was first developed for mechanical television systems, whi ...
(image) parts during creation,
post-production
Post-production, also known simply as post, is part of the process of filmmaking, video production, audio production, and photography. Post-production includes all stages of production occurring after principal photography or recording indivi ...
(mixing),
transmission, reception and play-back processing. AV synchronization is relevant in
television
Television (TV) is a telecommunication medium for transmitting moving images and sound. Additionally, the term can refer to a physical television set rather than the medium of transmission. Television is a mass medium for advertising, ...
,
videoconferencing
Videotelephony (also known as videoconferencing or video calling) is the use of audio signal, audio and video for simultaneous two-way communication. Today, videotelephony is widespread. There are many terms to refer to videotelephony. ''Vide ...
, or
film
A film, also known as a movie or motion picture, is a work of visual art that simulates experiences and otherwise communicates ideas, stories, perceptions, emotions, or atmosphere through the use of moving images that are generally, sinc ...
.
In industry terminology, the lip-sync error is expressed as the amount of time the audio departs from perfect synchronization with the video where a positive time number indicates the audio leads the video and a negative number indicates the audio lags the video.
This terminology and standardization of the numeric lip-sync error is utilized in the professional broadcast industry as evidenced by the various professional papers, standards such as ITU-R BT.1359-1, and other references below.
Digital or analog
audio video streams or
video files usually contain some sort of synchronization mechanism, either in the form of interleaved video and audio data or by explicit relative
timestamping of data.
Sources of error
AV-sync errors may accumulate across different stages, for a variety of reasons.
During creation, AV-sync errors may occur internally due to different
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, Scalar potential, potential fields, Seismic tomograph ...
delays between image and sound in
video camera
A video camera is an optical instrument that captures videos, as opposed to a movie camera, which records images on film. Video cameras were initially developed for the television industry but have since become widely used for a variety of other ...
and
microphone
A microphone, colloquially called a mic (), or mike, is a transducer that converts sound into an electrical signal. Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and publi ...
. The AV-sync delay is normally fixed. External AV-sync errors can occur if a microphone is placed far away from the sound source, the audio will be out of sync because the
speed of sound
The speed of sound is the distance travelled per unit of time by a sound wave as it propagates through an elasticity (solid mechanics), elastic medium. More simply, the speed of sound is how fast vibrations travel. At , the speed of sound in a ...
is much lower than the
speed of light
The speed of light in vacuum, commonly denoted , is a universal physical constant exactly equal to ). It is exact because, by international agreement, a metre is defined as the length of the path travelled by light in vacuum during a time i ...
. If the sound source is 340 meters from the microphone, then the sound arrives approximately 1 second later than the light. The AV-sync delay increases with distance. During mixing of video clips normally either the audio or video needs to be delayed so they are synchronized. The AV-sync delay is static but can vary with the individual clip.
Video editing
Video editing is the post-production and arrangement of video shots. To showcase excellent video editing to the public, video editors must be reasonable and ensure they have a thorough understanding of film, television, and other sorts of videog ...
effects can delay video causing it to lag the audio.
Transmission (
broadcasting
Broadcasting is the data distribution, distribution of sound, audio audiovisual content to dispersed audiences via a electronic medium (communication), mass communications medium, typically one using the electromagnetic spectrum (radio waves), ...
), reception, and playback may also introduce AV-sync errors. A video camera with built-in microphones or line-in may not delay sound and video paths by the same amount. Solid-state video cameras (e.g.
charge-coupled device
A charge-coupled device (CCD) is an integrated circuit containing an array of linked, or coupled, capacitors. Under the control of an external circuit, each capacitor can transfer its electric charge to a neighboring capacitor. CCD sensors are a ...
(CCD) and
CMOS image sensors) can delay the video signal by one or more frames. Audio and video signal processing circuitry exists with significant (and potentially non-constant) delays in television systems. Frame synchronizers, digital video effects processors, video noise reduction, format converters, and
compression systems are examples of widely used signal-processing elements that may contribute significant video delay.
Processing circuits
format conversion and deinterlace processing in video monitors can add one or more frames of video delay. A video monitor with built-in speakers or line-out may not delay sound and video paths equally. Some video monitors contain internal user-adjustable audio delays to aid in correction of errors.
Some transmission protocols like
RTP require an out-of-band method for synchronizing media streams. In some RTP systems, each media stream has its own timestamp using an independent clock rate and per-stream randomized starting value. A
RTCP
The RTP Control Protocol (RTCP) is a binary-encoded out-of-band signaling protocol that functions alongside the Real-time Transport Protocol (RTP). RTCP provides statistics and control information for an RTP session. It partners with RTP in th ...
Sender Report (SR) may be needed ''for each stream'' in order to synchronize streams.
Effect of no explicit AV-sync timing
When a digital or analog AV system stream does not have a synchronization method or mechanism, the stream may become out of sync. In film movies these timing errors are most commonly caused by worn films skipping over the
movie projector
A movie projector (or film projector) is an optics, opto-mechanics, mechanical device for displaying Film, motion picture film by projecting it onto a movie screen, screen. Most of the optical and mechanical elements, except for the illuminat ...
sprockets because the film has torn sprocket holes. Errors can also be caused by the
projectionist
A projectionist is a person who operates a movie projector, particularly as an employee of a movie theater. Projectionists are also known as "operators".
Historical background
N.B. The dates given in the subject headings are approximate.
Early ...
misthreading the film in the projector.
Synchronization errors have become a significant problem in the
digital television
Digital television (DTV) is the transmission of television signals using Digital signal, digital encoding, in contrast to the earlier analog television technology which used analog signals. At the time of its development it was considered an ...
industry because of the use of large amounts of video signal processing in television production, television broadcasting and
pixelated television displays such as
LCD
A liquid-crystal display (LCD) is a flat-panel display or other electronically modulated optical device that uses the light-modulating properties of liquid crystals combined with polarizers to display information. Liquid crystals do not em ...
,
DLP and
plasma displays. Pixelated displays utilize complex video signal processing to convert the resolution of the incoming video signal to the native resolution of the pixelated display, for example converting standard definition video to be displayed on a high definition display. Synchronization problems are commonly caused when significant amounts of
video processing In electronics engineering, video processing is a particular case of signal processing, in particular image processing, which often employs filter (video), video filters and where the input and output Signal (electrical engineering), signals are vid ...
is performed on the video part of the television program. Typical sources of significant video delays in the television field include video synchronizers and video compression encoders and decoders. Particularly troublesome encoders and decoders are used in
MPEG
The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by International Organization for Standardization, ISO and International Electrotechnical Commission, IEC that sets standards for media coding, includ ...
compression systems utilized for broadcasting
digital television
Digital television (DTV) is the transmission of television signals using Digital signal, digital encoding, in contrast to the earlier analog television technology which used analog signals. At the time of its development it was considered an ...
and storing television programs on consumer and professional recording and playback devices.
In broadcast television, it is not unusual for lip-sync error to vary by over 100 ms (several video frames) from time to time. AV-sync is commonly corrected and maintained with an
audio synchronizer. Television industry standards organizations have established acceptable amounts of audio and video timing error and suggested practices related to maintaining acceptable timing.
The EBU Recommendation R37 "The relative timing of the sound and vision components of a television signal" states that end-to-end audio/video sync should be within +40 ms and -60 ms (audio before/after video, respectively) and that each stage should be within +5 ms and -15 ms.
Viewer experience of incorrectly synchronized AV-sync
The result typically leaves a filmed or televised character's mouth movements mismatching spoken dialog, hence the term ''lip flap'' or ''lip-sync error''. The resulting audio-video sync error can be annoying to the viewer and may even cause the viewer to not enjoy the program, decrease the effectiveness of the program or lead to a negative perception of the speaker on the part of the viewer. The potential loss of effectiveness is of particular concern for product commercials and political candidates. Television industry standards organizations, such as the
Advanced Television Systems Committee, have become involved in setting standards for audio-video sync errors.
Because of these annoyances, AV-sync error is a concern to the television programming industry, including television stations, networks, advertisers and program production companies. Unfortunately, the advent of high-definition flat-panel display technologies (LCD, DLP and plasma), which can delay video more than audio, has moved the problem into the viewer's home and beyond the control of the television programming industry alone. Consumer product companies now offer audio-delay adjustments to compensate for video-delay changes in TVs,
soundbars and A/V receivers, and several companies manufacture dedicated digital audio delays made exclusively for lip-sync error correction.
Recommendations
For television applications, the
Advanced Television Systems Committee recommends that audio should lead video by no more than and audio should lag video by no more than 45 ms.
However, the
ITU performed strictly
controlled tests with expert viewers and found that the threshold for detectability is 45 ms lead to 125 ms lag.
For film, acceptable lip sync is considered to be no more than 22 milliseconds in either direction.
The
Consumer Electronics Association has published a set of recommendations for how digital television receivers should implement A/V sync.
SMPTE ST2064
SMPTE
The Society of Motion Picture and Television Engineers (SMPTE) (, rarely ), founded by Charles Francis Jenkins in 1916 as the Society of Motion Picture Engineers or SMPE, is a global professional association of engineers, technologists, and e ...
standard ST2064, published in 2015, provides technology to reduce or eliminate lip-sync errors in digital television. The standard utilizes audio and video fingerprints taken from a television program. The fingerprints can be recovered and used to correct the accumulated lip-sync error. When fingerprints have been generated for a TV program, and the required technology is incorporated, the viewer's
television set
A television set or television receiver (more commonly called TV, TV set, television, telly, or tele) is an electronic device for viewing and hearing television broadcasts, or as a computer monitor. It combines a tuner, display, and loudspeake ...
has the ability to continuously measure and correct lip-sync errors.
Timestamps
Presentation time stamps (PTS) are embedded in
MPEG transport stream
MPEG transport stream (MPEG-TS, MTS) or simply transport stream (TS) is a standard digital container format for transmission and storage of audio, video, and Program and System Information Protocol (PSIP) data. It is used in broadcast syst ...
s to precisely signal when each audio and video segment is to be presented and avoid AV-sync errors. However, these timestamps are often added after the video undergoes frame synchronization, format conversion and preprocessing, and thus the lip sync errors created by these operations will not be corrected by the addition and use of timestamps.
The
Real-time Transport Protocol
The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applic ...
clocks media using origination
timestamps on an arbitrary timeline. A real-time clock such as one delivered by the
Network Time Protocol
The Network Time Protocol (NTP) is a networking protocol for clock synchronization between computer systems over packet-switched, variable-Network latency, latency data networks. In operation since before 1985, NTP is one of the oldest Intern ...
or
Precision Time Protocol
The Precision Time Protocol (PTP) is a protocol for clock synchronization throughout a computer network with relatively high precision and therefore ''potentially'' high accuracy. In a local area network (LAN), accuracy can be sub-microsecon ...
and described in the
Session Description Protocol
The Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation. Its predominant use is in support of streaming media applications, such as voice over IP (VoIP) ...
associated with the media may be used to synchronize media. A server may then be used for synchronization between multiple receivers.
See also
*
Clapperboard
*
Dubbing
Dubbing (also known as re-recording and mixing) is a post-production process used in filmmaking and the video production process where supplementary recordings (known as doubles) are lip-synced and "mixed" with original production audio to cr ...
*
Lag (video games)
References
Further reading
*
*
*{{cite book, last= Sieranoja, first=S., author2=Sahidullah, Md, author3=Kinnunen, T., author4= Komulainen, J., author5= Hadid, A., title=2018 IEEE 3rd International Conference on Signal and Image Processing (ICSIP) , chapter=Audiovisual Synchrony Detection with Optimized Audio Features , pages=377–381, date=July 2018, chapter-url=http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/audiovisual_synchrony_2018.pdf, doi=10.1109/SIPROCESS.2018.8600424, isbn=978-1-5386-6396-7, s2cid=51682024, url=http://urn.fi/urn:nbn:fi-fe2020041415345
*