Scan Statistics
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, a scan statistic or window statistic is a problem relating to the clustering of randomly positioned points. An example of a typical problem is the maximum size of a cluster of points on a line or the longest series of successes recorded by a moving window of fixed length.
Joseph Naus Joseph is a common male name, derived from the Hebrew (). "Joseph" is used, along with " Josef", mostly in English, French and partially German languages. This spelling is also found as a variant in the languages of the modern-day Nordic count ...
first published on the problem in the 1960s, and has been called the "father of the scan statistic" in honour of his early contributions. The results can be applied in
epidemiology Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and Risk factor (epidemiology), determinants of health and disease conditions in a defined population, and application of this knowledge to prevent dise ...
,
public health Public health is "the science and art of preventing disease, prolonging life and promoting health through the organized efforts and informed choices of society, organizations, public and private, communities and individuals". Analyzing the de ...
and
astronomy Astronomy is a natural science that studies celestial objects and the phenomena that occur in the cosmos. It uses mathematics, physics, and chemistry in order to explain their origin and their overall evolution. Objects of interest includ ...
to find unusual clusters of events. It was extended by
Martin Kulldorff Martin Kulldorff (born 1962) is a Swedish biostatistician. He was a professor of medicine at Harvard Medical School from 2003 until his dismissal in 2024. He is a member of the US Food and Drug Administration's Drug Safety and Risk Management ...
to multidimensional settings and varying window sizes in a 1997 paper, which is () the most cited article in its journal, '' Communications in Statistics – Theory and Methods''. This work lead to the creation of the software
SaTScan SaTScan is a software tool that employs scan statistics for the spatial and temporal analysis of clusters of events. The software is trademarked by Martin Kulldorff, and was designed originally for public health and epidemiology to identify cluste ...
, a program trademarked by Martin Kulldorff that applies his methods to data. Recent results have shown that using scale-dependent critical values for the scan statistic allows to attain asymptotically optimal detection simultaneously for all signal lengths, thereby improving on the traditional scan, but this procedure has been criticized for losing too much power for short signals. Walther and Perry (2022) considered the problem of detecting an elevated mean on an interval with unknown location and length in the univariate Gaussian sequence model. They explain this discrepancy by showing that these asymptotic optimality results will necessarily be too imprecise to discern the performance of scan statistics in a practically relevant way, even in a large sample context. Instead, they propose to assess the performance with a new finite sample criterion. They presented three new calibration techniques for scan statistics that perform well across a range of relevant signal lengths to optimally increase performance of short signals. The scan-statistic-based methods have been specifically developed to detect rare variant associations in the noncoding genome, especially for the
intergenic region An intergenic region is a stretch of DNA sequences located between genes. Intergenic regions may contain functional elements and junk DNA. Properties and functions Intergenic regions may contain a number of functional DNA sequences such as p ...
. Compared with fixed-size sliding window analysis, scan-statistic-based methods use data-adaptive size dynamic window to scan the genome continuously, and increase the analysis power by flexibly selecting the locations and sizes of the signal regions. Some examples of these methods are Q-SCAN, SCANG, WGScan.


References


External links


SaTScan
free software for the spatial, temporal and space-time scan statistics Summary statistics Spatial analysis {{Statistics-stub