Shot Transition Detection
   HOME

TheInfoList



OR:

Shot transition detection (or simply ''shot detection'') also called cut detection is a field of research of
video processing In electronics engineering, video processing is a particular case of signal processing, in particular image processing, which often employs filter (video), video filters and where the input and output Signal (electrical engineering), signals are vid ...
. Its subject is the automated detection of transitions between ''shots'' in
digital video Digital video is an electronic representation of moving visual images (video) in the form of encoded digital data. This is in contrast to analog video, which represents moving visual images in the form of analog signals. Digital video comprises ...
with the purpose of temporal segmentation of videos.


Use

Shot transition detection is used to split up a film into basic temporal units called ''shots''; a ''shot'' is a series of interrelated consecutive pictures taken contiguously by a single camera and representing a continuous action in time and space. This operation is of great use in software for post-production of videos. It is also a fundamental step of automated indexing and content-based video retrieval or summarization applications which provide an efficient access to huge video archives, e.g. an application may choose a representative picture from each scene to create a visual overview of the whole film and, by processing such indexes, a search engine can process search items like "show me all films where there's a scene with a lion in it." Cut detection can do nothing that a human editor couldn't do manually, however it is advantageous as it saves time. Furthermore, due to the increase in the use of digital video and, consequently, in the importance of the aforementioned indexing applications, the automatic cut detection is very important nowadays.


Basic technical terms

In simple terms cut detection is about finding the positions in a video in that one scene is replaced by another one with different visual content. Technically speaking the following terms are used: A digital video consists of frames that are presented to the viewer's eye in rapid succession to create the impression of movement. "Digital" in this context means both that a single frame consists of
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a Raster graphics, raster image, or the smallest addressable element in a dot matrix display device. In most digital display devices, p ...
s and the data is present as
binary data Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra. Binary data occurs in many different technical and scientific fields, wh ...
, such that it can be processed with a computer. Each frame within a digital video can be uniquely identified by its frame index, a serial number. A shot is a sequence of frames shot uninterruptedly by one camera. There are several
film transition A film, also known as a movie or motion picture, is a work of visual art that simulates experiences and otherwise communicates ideas, stories, perceptions, emotions, or atmosphere through the use of moving images that are generally, sinc ...
s usually used in film editing to juxtapose adjacent shots; In the context of shot transition detection they are usually group into two types: * Abrupt Transitions - This is a sudden transition from one shot to another, i. e. one frame belongs to the first shot, the next frame belongs to the second shot. They are also known as hard cuts or simply cuts. *Gradual Transitions - In this kind of transitions the two shots are combined using chromatic, spatial or spatial-chromatic effects which gradually replace one shot by another. These are also often known as soft transitions and can be of various types, e.g., wipes, dissolves, fades... "Detecting a cut" means that the position of a cut is gained; more precisely a hard cut is gained as "hard cut between frame i and frame i+1", a soft cut as "soft cut from frame i to frame j". A transition that is detected correctly is called a hit, a cut that is there but was not detected is called a missed hit and a position in that the software assumes a cut, but where actually no cut is present, is called a false hit. ''An introduction to film editing and an exhaustive list of shot transition techniques can be found at
film editing Film editing is both a creative and a technical part of the post-production process of filmmaking. The term is derived from the traditional process of working with film stock, film which increasingly involves the use Digital cinema, of digital ...
.''


Vastness of the problem

Although cut detection appears to be a simple task for a human being, it is a non-trivial task for computers. Cut detection would be a trivial problem if each frame of a video was enriched with additional information about ''when'' and ''by which camera'' it was taken. Possibly no algorithm for cut detection will ever be able to detect all cuts with certainty, unless it is provided with powerful artificial intelligence. While most algorithms achieve good results with hard cuts, many fail with recognizing soft cuts. Hard cuts usually go together with sudden and extensive changes in the visual content while soft cuts feature slow and gradual changes. A human being can compensate this lack of visual diversity with understanding the meaning of a scene. While a computer assumes a black line wiping a shot away to be "just another regular object moving slowly through the on-going scene", a person understands that the scene ends and is replaced by a black screen.


Methods

Each method for cut detection works on a two-phase-principle: # Scoring – Each pair of consecutive frames of a digital video is given a certain score that represents the similarity/dissimilarity between them. # Decision – All scores calculated previously are evaluated and a cut is detected if the score is considered high. This principle is error prone. First, because even minor exceedings of the threshold value produce a hit, it must be ensured that phase one scatters values widely to maximize the average difference between the score for "cut" and "no cut". Second, the threshold must be chosen with care; usually useful values can be gained with statistical methods.


Scoring

There are many possible scores used to access the differences in the visual content; some of the most common are: *
Sum of absolute differences In digital image processing, the sum of absolute differences (SAD) is a measure of the similarity between image blocks. It is calculated by taking the absolute difference between each pixel in the original block and the corresponding pixel in th ...
(SAD). This is both the most obvious and most simple algorithm of all: The two consecutive frames are compared
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a Raster graphics, raster image, or the smallest addressable element in a dot matrix display device. In most digital display devices, p ...
by pixel, summing up the
absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if x is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), ...
s of the differences of each two corresponding pixels. The result is a positive number that is used as the score. SAD reacts very sensitively to even minor changes within a scene: fast movements of the camera, explosions or the simple switching on of a light in a previously dark scene result in false hits. On the other hand, SAD hardly reacts to soft cuts at all. Yet, SAD is used often to produce a basic set of "possible hits" as it detects all visible hard cuts with utmost probability. * Histogram differences (HD). Histogram differences is very similar to Sum of absolute differences. The difference is that HD computes the difference between the
histogram A histogram is a visual representation of the frequency distribution, distribution of quantitative data. To construct a histogram, the first step is to Data binning, "bin" (or "bucket") the range of values— divide the entire range of values in ...
s of two consecutive frames; a histogram is a table that contains for each color within a frame the number of pixels that are shaded in that color. HD is not as sensitive to minor changes within a scene as SAD and thus produces less false hits. One major problem of HD is that two images can have exactly the same histograms while the shown content differs extremely, e. g. a picture of the sea and a beach can have the same histogram as one of a corn field and the sky. HD offers no guarantee that it recognizes hard cuts. * Edge change ratio (ECR). The ECR attempts to compare the actual content of two frames. It transforms both frames to ''edge pictures'', i. e. it extracts the probable outlines of objects within the pictures (see
edge detection Edge or EDGE may refer to: Technology Computing * Edge computing, a network load-balancing system * Edge device, an entry point to a computer network * Adobe Edge, a graphical development application * Microsoft Edge, a web browser developed b ...
for details). Afterwards it compares these edge pictures using
dilation wiktionary:dilation, Dilation (or dilatation) may refer to: Physiology or medicine * Cervical dilation, the widening of the cervix in childbirth, miscarriage etc. * Coronary dilation, or coronary reflex * Dilation and curettage, the opening of ...
to compute a probability that the second frame contains the same objects as the first frame. The ECR is one of the best performing algorithms for scoring. It reacts very sensitively to hard cuts and can detect many soft cuts by nature. In its basic form even ECR cannot detect soft cuts such as
wipe Wipe or wiping may refer to: Hygiene * Toilet paper or wet wipes, or their use Arts and media * Wipe (transition), a gradual transition in film editing * Wipe curtain, a kind of theater curtain * ''Wipe'' or ''Screenwipe'', a television series b ...
s as it considers the fading-in objects as regular objects moving through the scene. Yet, ECR can be extended manually to recognize special forms of soft cuts. Finally, a combination of two or more of these scores can improve the performance.


Decision

In the decision phase the following approaches are usually used: * Fixed Threshold – In this approach, the scores are compared to a threshold which was set previously and if the score is higher than the threshold a cut is declared. * Adaptive Threshold – In this approach, the scores are compared to a threshold which considers various scores in the video to adapt the threshold to the properties of the current video. Like in the previous case, if the score is higher than the corresponding threshold a cut is declared. *
Machine Learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
- Machine learning techniques can be applied also to the decision process.


Cost

All of the above algorithms complete in O(n) — that is to say they run in linear time — where ''n'' is the number of frames in the input video. The algorithms differ in a constant factor that is determined mostly by the
image resolution Image resolution is the level of detail of an image. The term applies to digital images, film images, and other types of images. "Higher resolution" means more image detail. Image resolution can be measured in various ways. Resolution quantifies ...
of the video.


Measures for quality

Usually the following three measures are used to measure the quality of a cut detection algorithm: *
Recall Recall may refer to: * Recall (baseball), a baseball term * Recall (bugle call), a signal to stop * Recall (information retrieval), a statistical measure * ReCALL (journal), ''ReCALL'' (journal), an academic journal about computer-assisted langua ...
is the probability that an existing cut will be detected: *
Precision Precision, precise or precisely may refer to: Arts and media * ''Precision'' (march), the official marching music of the Royal Military College of Canada * "Precision" (song), by Big Sean * ''Precisely'' (sketch), a dramatic sketch by the Eng ...
is the probability that an assumed cut is in fact a cut: * F1 is a combined measure that results in high value if, and only if, both precision ''and'' recall result in high values:
The symbols stand for: C, the number of correctly detected cuts ("correct hits"), M, the number of not detected cuts ("missed hits") and F, the number of falsely detected cuts ("false hits"). All of these measures are mathematical measures, i. e. they deliver values in between 0 and 1. The basic rule is: the higher the value, the better performs the algorithm.


Benchmarks


TRECVid SBD Benchmark 2001-2007

Automatic shot transition detection was one of the tracks of activity within the annual TRECVid benchmarking exercise from 2001 to 2007. There were 57 algorithms from different research groups. Сalculations of F score were performed for each algorithm on a dataset, which was replenished annually.


MSU SBD Benchmark 2020-2021

The benchmark has compared 6 methods on more than 120 videos from RAI and MSU CC datasets with different types of scene changes, some of which were added manually. The authors state that the main feature of this benchmark is the complexity of shot transitions in the dataset. To prove it they calculate SI/TI metric of shots and compare it with others publicly available datasets.


References

{{Reflist, refs= {{cite book , author1 = P. Balasubramaniam , author2 = R Uthayakumar , title = Mathematical Modelling and Scientific Computation: International Conference, ICMMSC 2012, Gandhigram, Tamil Nadu, India, March 16-18, 2012 , url = https://books.google.com/books?id=WDyqCAAAQBAJ&pg=PA421 , date = 2 March 2012 , publisher = Springer , isbn = 978-3-642-28926-2 , pages = 421– {{cite book , author1 = Weiming Shen , author2 = Jianming Yong , author3 = Yun Yang , title = Computer Supported Cooperative Work in Design IV: 11th International Conference, CSCWD 2007, Melbourne, Australia, April 26-28, 2007. Revised Selected Papers , url = https://books.google.com/books?id=Ex15BGtT8oUC&pg=PA100 , date = 18 December 2008 , publisher = Springer Science & Business Media , isbn = 978-3-540-92718-1 , pages = 100– {{cite book , author1 = Joan Cabestany , author2 = Ignacio Rojas , author3 = Gonzalo Joya , title = Advances in Computational Intelligence: 11th International Work-Conference on Artificial Neural Networks, IWANN 2011, Torremolinos-Málaga, Spain, June 8-10, 2011, Proceedings , url = https://books.google.com/books?id=iEmt4qx7xVQC&pg=PA521 , date = 30 May 2011 , publisher = Springer Science & Business Media , isbn = 978-3-642-21500-1 , pages = 521– , quote = Shot detection is performed by means of shot transition detection algorithms. Two different types of transitions are used to split a video into shots: – Abrupt transitions, also referred as cuts or straight cuts, occur when a sudden change from one ... Video processing