Lip sync error
   HOME

TheInfoList



OR:

Audio-to-video synchronization (AV synchronization, also known as
lip sync Lip sync or lip synch (pronounced , the same as the word ''sink'', short for lip synchronization) is a technical term for matching a speaking or singing person's lip movements with sung or spoken vocals. Audio for lip syncing is generated th ...
, or by the lack of it: lip-sync error, lip flap) refers to the relative timing of
audio Audio most commonly refers to sound, as it is transmitted in signal form. It may also refer to: Sound *Audio signal, an electrical representation of sound *Audio frequency, a frequency in the audio spectrum * Digital audio, representation of sou ...
(sound) and
video Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) sy ...
(image) parts during creation,
post-production Post-production is part of the process of filmmaking, video production, audio production, and photography. Post-production includes all stages of production occurring after principal photography or recording individual program segments. Th ...
(mixing),
transmission Transmission may refer to: Medicine, science and technology * Power transmission ** Electric power transmission ** Propulsion transmission, technology allowing controlled application of power *** Automatic transmission *** Manual transmission ** ...
, reception and play-back processing. AV synchronization can be an issue in
television Television, sometimes shortened to TV, is a telecommunication medium for transmitting moving images and sound. The term can refer to a television set, or the medium of television transmission. Television is a mass medium for advertising, ...
,
videoconferencing Videotelephony, also known as videoconferencing and video teleconferencing, is the two-way or multipoint reception and transmission of audio signal, audio and video signals by people in different locations for Real-time, real time communication. ...
, or
film A film also called a movie, motion picture, moving picture, picture, photoplay or (slang) flick is a work of visual art that simulates experiences and otherwise communicates ideas, stories, perceptions, feelings, beauty, or atmospher ...
. In industry terminology, the lip-sync error is expressed as an amount of time the audio departs from perfect synchronization with the video where a positive time number indicates the audio leads the video and a negative number indicates the audio lags the video. This terminology and standardization of the numeric lip-sync error is utilized in the professional broadcast industry as evidenced by the various professional papers, standards such as ITU-R BT.1359-1, and other references below. Digital or analog audio video streams or video files usually contain some sort of synchronization mechanism, either in the form of interleaved video and audio data or by explicit relative
timestamping A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. Timestamps do not have to be based on some absolut ...
of data. The processing of data must respect the relative data timing by e.g. stretching between or interpolation of received data. If the processing does not respect the AV-sync error, it will increase whenever data gets lost because of transmission errors or because of missing or mistimed processing.


Incorrectly synchronized

There are different ways in which the AV-sync can get incorrectly synchronized: *During creation AV-sync errors happen because of **Internal AV-sync error: Different
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing '' signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...
delays between image and sound in
video camera A video camera is an optical instrument that captures videos (as opposed to a movie camera, which records images on film). Video cameras were initially developed for the television industry but have since become widely used for a variety of oth ...
and
microphone A microphone, colloquially called a mic or mike (), is a transducer that converts sound into an electrical signal. Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and publ ...
. The AV-sync delay is normally fixed. **External AV-sync error: If a microphone is placed far away from the sound source, the audio will be out of sync because the
speed of sound The speed of sound is the distance travelled per unit of time by a sound wave as it propagates through an elastic medium. At , the speed of sound in air is about , or one kilometre in or one mile in . It depends strongly on temperature as we ...
is much lower than the
speed of light The speed of light in vacuum, commonly denoted , is a universal physical constant that is important in many areas of physics. The speed of light is exactly equal to ). According to the special theory of relativity, is the upper limit fo ...
. If the sound source is 340 meters from the microphone, then the sound arrives approximately 1 second later than the light. The AV-sync delay increases with distance. *During mixing of video clips normally either the audio or video needs to be delayed so they are synchronized. The AV-sync delay is static but can vary with the individual clip. *
Video editing Video editing is the manipulation and arrangement of video shots. Video editing is used to structure and present all video information, including films and television shows, video advertisements and video essays. Video editing has been dramatical ...
effects. Examples of transmission (
broadcasting Broadcasting is the distribution of audio or video content to a dispersed audience via any electronic mass communications medium, but typically one using the electromagnetic spectrum (radio waves), in a one-to-many model. Broadcasting began wi ...
), reception and playback that can get the AV-sync incorrectly synchronized: *A video camera with built-in microphones or line-in may not delay sound and video paths by the same number of milliseconds. A video camera should have some sort of explicit AV-sync timing put into the video and audio streams. Solid-state video cameras (e.g.
charge-coupled device A charge-coupled device (CCD) is an integrated circuit containing an array of linked, or coupled, capacitors. Under the control of an external circuit, each capacitor can transfer its electric charge to a neighboring capacitor. CCD sensors are a ...
(CCD) and CMOS image sensors) can delay the video signal by one or more frames. *An AV-stream may get corrupted during transmission because of electrical
glitch A glitch is a short-lived fault in a system, such as a transient fault that corrects itself, making it difficult to troubleshoot. The term is particularly common in the computing and electronics industries, in circuit bending, as well as among ...
es (wired) or wireless interruptions - this may cause it to become out of sync. The AV-sync delay normally increases with time. *There is extensive use of audio and video signal processing circuitry with significant (and often non-constant) delays in television systems. Particular video signal processing circuitry which is widely used and contributes significant video delays include frame synchronizers, digital video effects processors, video noise reduction, format converters and
compression systems Compression Systems (formerly Cooper Compression / Cooper Energy Services / Cooper Turbocompressor / Cooper), one of five organizational groups within Cameron International Corporation, is a provider of reciprocating and centrifugal compression e ...
. *The video monitor processing circuit may delay the video stream. Pixelated displays require video
format conversion Data conversion is the conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, computer hardware is built on the basis of certain standards, which requires th ...
and deinterlace processing which can add one or more frames of video delay. *A video monitor with built-in speakers or line-out may not delay sound and video paths by the same number of milliseconds. Some video monitors contain internal user-adjustable audio delays to aid in correction of errors. *Some transmission protocols like RTP require an out-of-band method for synchronizing media streams. In RTP's case, each media stream has its own timestamp using an independent clock rate and per-stream randomized starting value. A RTCP Sender Report (SR) is needed ''for each stream'' in order to synchronize streams. The necessary RTCP packets might be lost (since RTP/RTCP does not guarantee delivery) or not sent until at least several seconds after the stream has begun. Many software clients do not send RTCP at all or send non-compliant data.


Effect of no explicit AV-sync timing

When a digital or analog audio-video stream does not have some sort of explicit AV-sync timing these effects will cause the stream to become out of sync: *In film movies these timing errors are most commonly caused by worn films skipping over the
movie projector A movie projector is an opto-mechanical device for displaying motion picture film by projecting it onto a screen. Most of the optical and mechanical elements, except for the illumination and sound devices, are present in movie cameras. Mod ...
sprockets because the film has torn sprocket holes. *Errors can also be caused by the
projectionist A projectionist is a person who operates a movie projector, particularly as an employee of a movie theater. Projectionists are also known as "operators". Historical background N.B. The dates given in the subject headings are approximate. Early ...
misthreading the film in the projector, although this is rare with competent projectionists. *AV-sync is commonly corrected and maintained with an
audio synchronizer An audio synchronizer is a variable audio delay used to correct or maintain audio-video sync or timing also known as lip sync error. See for example the specification for audio to video timing given in ATSC Document IS-191. Modern television sys ...
. Television industry standards organizations have established acceptable amounts of audio and video timing error and suggested practices related to maintaining acceptable timing. *AV-sync errors are becoming a significant problem in the
digital television Digital television (DTV) is the transmission of television signals using digital encoding, in contrast to the earlier analog television technology which used analog signals. At the time of its development it was considered an innovative adva ...
industry because of the use of large amounts of video signal processing in television production, television broadcasting and
pixelated Pixelization (British English, pixelisation) or mosaic processing is any technique used in editing images or video, whereby an image is blurred by displaying part or all of it at a markedly lower resolution. It is primarily used for censorshi ...
television displays such as LCD, DLP and
plasma displays A plasma display panel (PDP) is a type of flat panel display that uses small cells containing plasma: ionized gas that responds to electric fields. Plasma televisions were the first large (over 32 inches diagonal) flat panel displays to be rele ...
. *In the
television Television, sometimes shortened to TV, is a telecommunication medium for transmitting moving images and sound. The term can refer to a television set, or the medium of television transmission. Television is a mass medium for advertising, ...
field, audio-video sync problems are commonly caused when significant amounts of
video processing In electronics engineering, video processing is a particular case of signal processing, in particular image processing, which often employs video filters and where the input and output signals are video files or video streams. Video processing ...
is performed on the video part of the television program. *Typical sources of significant video delays in the television field include video synchronizers and video compression encoders and decoders. Particularly troublesome encoders and decoders are used in
MPEG The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and f ...
compression systems utilized for broadcasting
digital television Digital television (DTV) is the transmission of television signals using digital encoding, in contrast to the earlier analog television technology which used analog signals. At the time of its development it was considered an innovative adva ...
and storing television programs on consumer and professional recording and playback devices. *A source of significant video delay is found in
pixelated Pixelization (British English, pixelisation) or mosaic processing is any technique used in editing images or video, whereby an image is blurred by displaying part or all of it at a markedly lower resolution. It is primarily used for censorshi ...
television displays (LCD, DLP and plasma) which utilize complex video signal processing to convert the resolution of the incoming video signal to the native resolution of the pixelated display, for example converting standard definition video to be displayed on a high definition display. "Lip-flap" may exceed 200 ms at times. *In broadcast television, it is not unusual for lip-sync error to vary by over 100 ms (several video frames) from time to time. *The EBU Recommendation R37 “The relative timing of the sound and vision components of a television signal” states that end-to-end audio/video sync should be within +40ms and -60ms (audio before / after video, respectively) and that each stage should be within +5ms and -15ms.


Viewer experience of incorrectly synchronized AV-sync

The result typically leaves a filmed or televised character moving his or her mouth when there is no spoken dialog to accompany it, hence the term "lip flap" or "lip-sync error". The resulting audio-video sync error can be annoying to the viewer and may even cause the viewer to not enjoy the program, decrease the effectiveness of the program or lead to a negative perception of the speaker on the part of the viewer. The potential loss of effectiveness is of particular concern for product commercials and political candidates. Television industry standards organizations, such as the
Advanced Television Systems Committee The Advanced Television Systems Committee (ATSC) is an international nonprofit organization developing technical standards for digital terrestrial television and data broadcasting. ATSC's 120-plus member organizations represent the broadcast, ...
, have become involved in setting standards for audio-video sync errors. Because of these annoyances, AV-sync error is a concern to the television programming industry, including television stations, networks, advertisers and program production companies. Unfortunately, the advent of high-definition flat-panel display technologies (LCD, DLP and plasma), which can delay video more than audio, has moved the problem into the viewer's home and beyond the control of the television programming industry alone. Consumer product companies now offer audio-delay adjustments to compensate for video-delay changes in TVs and A/V receivers, and several companies manufacture dedicated digital audio delays made exclusively for lip-sync error correction.


Recommendations

For television applications, the
Advanced Television Systems Committee The Advanced Television Systems Committee (ATSC) is an international nonprofit organization developing technical standards for digital terrestrial television and data broadcasting. ATSC's 120-plus member organizations represent the broadcast, ...
recommends that audio should lead video by no more than 15 milliseconds and audio should lag video by no more than 45 milliseconds. However, the
ITU The International Telecommunication Union is a specialized agency of the United Nations responsible for many matters related to information and communication technologies. It was established on 17 May 1865 as the International Telegraph Union ...
performed strictly controlled tests with expert viewers and found that the threshold for detectability is -125ms to +45ms. For film, acceptable lip sync is considered to be no more than 22 milliseconds in either direction. The
Consumer Electronics Association The Consumer Technology Association (CTA) is a standard and trade organization representing 1,376 consumer technology companies in the United States. CTA works to influence public policy, holds events such as the Consumer Electronics Show (CE ...
has published a set of recommendations for how digital television receivers should implement A/V sync.


SMPTE ST2064

SMPTE The Society of Motion Picture and Television Engineers (SMPTE) (, rarely ), founded in 1916 as the Society of Motion Picture Engineers or SMPE, is a global professional association of engineers, technologists, and executives working in the m ...
standard ST2064, published in 2015, provides technology to reduce or eliminate lip-sync errors in digital television. The standard utilizes audio and video fingerprints taken from a television program. The fingerprints can be recovered and used to correct the accumulated lip-sync error. When fingerprints have been generated for a TV program, and the required technology is incorporated, the viewer's display device has the ability to continuously measure and correct lip-sync errors.


Timestamps

Presentation time stamp The presentation timestamp (PTS) is a timestamp metadata field in an MPEG transport stream or MPEG program stream that is used to achieve synchronization of programs' separate elementary streams (for example Video, Audio, Subtitles) when present ...
s (PTS) are embedded in MPEG transport streams to precisely signal when each audio and video segment is to be presented, to avoid AV-sync errors. However, these timestamps are often added after the video undergoes frame synchronization, format conversion and preprocessing, and thus the lip sync errors created by these operations will not be corrected by the addition and use of timestamps. The
Real-time Transport Protocol The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applicati ...
clocks media using origination timestamps on an arbitrary timeline. A real-time clock such as one delivered by the
Network Time Protocol The Network Time Protocol (NTP) is a networking protocol for clock synchronization between computer systems over packet-switched, variable- latency data networks. In operation since before 1985, NTP is one of the oldest Internet protocols in ...
and described in the
Session Description Protocol The Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation. Its predominant use is in support of streaming media applications, such as voice over IP (VoIP) ...
associated with the media may be used to synchronize media. A server may then be used to for final synchronization to remove any residual offset.


See also

*
Audio synchronizer An audio synchronizer is a variable audio delay used to correct or maintain audio-video sync or timing also known as lip sync error. See for example the specification for audio to video timing given in ATSC Document IS-191. Modern television sys ...
*
Clapperboard A clapperboard (also known by various other names including dumb slate) is a device used in filmmaking and video production to assist in synchronizing of picture and sound, and to designate and mark the various scenes and takes as they are ...
*
Dubbing (filmmaking) Dubbing (re-recording and mixing) is a post-production process used in filmmaking and video production, often in concert with sound design, in which additional or supplementary recordings are lip-synced and "mixed" with original production sou ...
* Input lag *
Lip sync Lip sync or lip synch (pronounced , the same as the word ''sink'', short for lip synchronization) is a technical term for matching a speaking or singing person's lip movements with sung or spoken vocals. Audio for lip syncing is generated th ...


References


Further reading

* * *{{cite journal, last= Sieranoja, first=S., author2=Sahidullah, Md, author3=Kinnunen, T., author4= Komulainen, J., author5= Hadid, A., title= Audiovisual Synchrony Detection with Optimized Audio Features , journal= IEEE 3rd Int. Conference on Signal and Image Processing (ICSIP 2018) , pages=377–381, date=July 2018, url=http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/audiovisual_synchrony_2018.pdf, doi=10.1109/SIPROCESS.2018.8600424, isbn=978-1-5386-6396-7, s2cid=51682024 *