HOME

TheInfoList



OR:

Predictive modelling uses
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive models are often used to detect crimes and identify suspects, after the crime has taken place. In many cases, the model is chosen on the basis of detection theory to try to guess the probability of an outcome given a set amount of input data, for example given an
email Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...
determining how likely that it is
spam Spam most often refers to: * Spam (food), a consumer brand product of canned processed pork of the Hormel Foods Corporation * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ...
. Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set. For example, a model might be used to determine whether an email is spam or "ham" (non-spam). Depending on definitional boundaries, predictive modelling is synonymous with, or largely overlapping with, the field of
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, as it is more commonly referred to in academic or research and development contexts. When deployed commercially, predictive modelling is often referred to as
predictive analytics Predictive analytics encompasses a variety of Statistics, statistical techniques from data mining, Predictive modelling, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or other ...
. Predictive modelling is often contrasted with causal modelling/analysis. In the former, one may be entirely satisfied to make use of indicators of, or proxies for, the outcome of interest. In the latter, one seeks to determine true cause-and-effect relationships. This distinction has given rise to a burgeoning literature in the fields of research methods and statistics and to the common statement that "
correlation does not imply causation The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The id ...
".


Models

Nearly any
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
can be used for prediction purposes. Broadly speaking, there are two classes of predictive models: parametric and non-parametric. A third class, semi-parametric models, includes features of both. Parametric models make "specific assumptions with regard to one or more of the population parameters that characterize the underlying distribution(s)". Non-parametric models "typically involve fewer assumptions of structure and distributional form han parametric modelsbut usually contain strong assumptions about independencies".


Applications


Uplift modelling

Uplift modelling is a technique for modelling the ''change in probability'' caused by an action. Typically this is a marketing action such as an offer to buy a product, to use a product more or to re-sign a contract. For example, in a retention campaign you wish to predict the change in probability that a customer will remain a customer if they are contacted. A model of the change in probability allows the retention campaign to be targeted at those customers on whom the change in probability will be beneficial. This allows the retention programme to avoid triggering unnecessary churn or customer attrition without wasting money contacting people who would act anyway.


Archaeology

Predictive modelling in
archaeology Archaeology or archeology is the study of human activity through the recovery and analysis of material culture. The archaeological record consists of Artifact (archaeology), artifacts, architecture, biofact (archaeology), biofacts or ecofacts, ...
gets its foundations from
Gordon Willey Gordon Randolph Willey (7 March 1913 – 28 April 2002) was an American archaeologist who was described by colleagues as the "dean" of New World archaeology.Sabloff 2004, p.406 Willey performed fieldwork at excavations in South America, Central A ...
's mid-fifties work in the
Virú Valley The Viru Valley is located in La Libertad Region on the north west coast of Peru. The Viru Valley Project In 1946 the first attempt to study settlement patterns in the Americas The Americas, sometimes collectively called America, are a ...
of Peru. Complete, intensive surveys were performed then covariability between cultural remains and natural features such as slope and vegetation were determined. Development of quantitative methods and a greater availability of applicable data led to growth of the discipline in the 1960s and by the late 1980s, substantial progress had been made by major land managers worldwide. Generally, predictive modelling in archaeology is establishing statistically valid causal or covariable relationships between natural proxies such as soil types, elevation, slope, vegetation, proximity to water, geology, geomorphology, etc., and the presence of archaeological features. Through analysis of these quantifiable attributes from land that has undergone archaeological survey, sometimes the "archaeological sensitivity" of unsurveyed areas can be anticipated based on the natural proxies in those areas. Large land managers in the United States, such as the
Bureau of Land Management The Bureau of Land Management (BLM) is an agency within the United States Department of the Interior responsible for administering federal lands, U.S. federal lands. Headquartered in Washington, D.C., the BLM oversees more than of land, or one ...
(BLM), the
Department of Defense The United States Department of Defense (DoD, USDOD, or DOD) is an executive department of the U.S. federal government charged with coordinating and supervising the six U.S. armed services: the Army, Navy, Marines, Air Force, Space Force, ...
(DOD), and numerous highway and parks agencies, have successfully employed this strategy. By using predictive modelling in their cultural resource management plans, they are capable of making more informed decisions when planning for activities that have the potential to require ground disturbance and subsequently affect archaeological sites.


Customer relationship management

Predictive modelling is used extensively in analytical
customer relationship management Customer relationship management (CRM) is a strategic process that organizations use to manage, analyze, and improve their interactions with customers. By leveraging data-driven insights, CRM helps businesses optimize communication, enhance cus ...
and
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
to produce customer-level models that describe the likelihood that a customer will take a particular action. The actions are usually sales, marketing and customer retention related. For example, a large
consumer organization Consumer organizations are advocacy groups that seek to protect people from corporate abuse like unsafe products, predatory lending, false advertising, astroturfing and pollution. Consumer Organizations may operate via protests, litigation, Adver ...
such as a mobile telecommunications operator will have a set of predictive models for product cross-sell, product deep-sell (or upselling) and churn. It is also now more common for such an organization to have a model of savability using an uplift model. This predicts the likelihood that a customer can be saved at the end of a contract period (the change in churn probability) as opposed to the standard churn prediction model.


Auto insurance

Predictive modelling is utilised in
vehicle insurance Vehicle insurance (also known as car insurance, motor insurance, or auto insurance) is insurance for automobile, cars, trucks, motorcycles, and other road vehicles. Its primary use is to provide financial protection against physical damage or bo ...
to assign risk of incidents to policy holders from information obtained from policy holders. This is extensively employed in usage-based insurance solutions where predictive models utilise telemetry-based data to build a model of predictive risk for claim likelihood. Black-box auto insurance predictive models utilise GPS or
accelerometer An accelerometer is a device that measures the proper acceleration of an object. Proper acceleration is the acceleration (the rate of change (mathematics), rate of change of velocity) of the object relative to an observer who is in free fall (tha ...
sensor input only. Some models include a wide range of predictive input beyond basic telemetry including advanced driving behaviour, independent crash records, road history, and user profiles to provide improved risk models.


Health care

In 2009 Parkland Health & Hospital System began analyzing electronic medical records in order to use predictive modeling to help identify patients at high risk of readmission. Initially, the hospital focused on patients with congestive heart failure, but the program has expanded to include patients with diabetes, acute myocardial infarction, and pneumonia. In 2018, Banerjee et al. proposed a
deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
model for estimating short-term
life expectancy Human life expectancy is a statistical measure of the estimate of the average remaining years of life at a given age. The most commonly used measure is ''life expectancy at birth'' (LEB, or in demographic notation ''e''0, where '' ...
(>3 months) of the patients by analyzing free-text clinical notes in the electronic medical record, while maintaining the temporal visit sequence. The model was trained on a large dataset (10,293 patients) and validated on a separated dataset (1818 patients). It achieved an area under the ROC (
Receiver Operating Characteristic A receiver operating characteristic curve, or ROC curve, is a graph of a function, graphical plot that illustrates the performance of a binary classifier model (can be used for multi class classification as well) at varying threshold values. ROC ...
) curve of 0.89. To provide explain-ability, they developed an interactive graphical tool that may improve physician understanding of the basis for the model's predictions. The high accuracy and explain-ability of the PPES-Met model may enable the model to be used as a decision support tool to personalize metastatic cancer treatment and provide valuable assistance to physicians. The first clinical prediction model reporting guidelines were published in 2015 (Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD)), and have since been updated. Predictive modelling has been used to estimate surgery duration.


Algorithmic trading

Predictive modeling in trading is a modeling process wherein the probability of an outcome is predicted using a set of predictor variables. Predictive models can be built for different assets like stocks, futures, currencies, commodities etc. Predictive modeling is still extensively used by trading firms to devise strategies and trade. It utilizes mathematically advanced software to evaluate indicators on price, volume, open interest and other historical data, to discover repeatable patterns.


Lead tracking systems

Predictive modelling gives lead generators a head start by forecasting data-driven outcomes for each potential campaign. This method saves time and exposes potential blind spots to help client make smarter decisions.


Notable failures of predictive modeling

Although not widely discussed by the mainstream predictive modeling community, predictive modeling is a methodology that has been widely used in the financial industry in the past and some of the major failures contributed to the
2008 financial crisis The 2008 financial crisis, also known as the global financial crisis (GFC), was a major worldwide financial crisis centered in the United States. The causes of the 2008 crisis included excessive speculation on housing values by both homeowners ...
. These failures exemplify the danger of relying exclusively on models that are essentially backward looking in nature. The following examples are by no mean a complete list: # Bond rating. S&P,
Moody's Moody's Ratings, previously and still legally known as Moody's Investors Service and often referred to as Moody's, is the bond credit rating business of Moody's Corporation, representing the company's traditional line of business and its histo ...
and Fitch quantify the
probability of default Probability of default (PD) is a financial term describing the likelihood of a default over a particular time horizon. It provides an estimate of the likelihood that a borrower will be unable to meet its debt obligations. PD is used in a varie ...
of bonds with discrete variables called rating. The rating can take on discrete values from AAA down to D. The rating is a predictor of the risk of default based on a variety of variables associated with the borrower and historical
macroeconomic Macroeconomics is a branch of economics that deals with the performance, structure, behavior, and decision-making of an economy as a whole. This includes regional, national, and global economies. Macroeconomists study topics such as output/ GDP ...
data. The rating agencies failed with their ratings on the US$600 billion mortgage backed Collateralized Debt Obligation ( CDO) market. Almost the entire AAA sector (and the super-AAA sector, a new rating the rating agencies provided to represent super safe investment) of the CDO market defaulted or severely downgraded during 2008, many of which obtained their ratings less than just a year previously. # So far, no statistical models that attempt to predict equity market prices based on historical data are considered to consistently make correct predictions over the long term. One particularly memorable failure is that of Long Term Capital Management, a fund that hired highly qualified analysts, including a
Nobel Memorial Prize in Economic Sciences The Nobel Memorial Prize in Economic Sciences, officially the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (), commonly referred to as the Nobel Prize in Economics(), is an award in the field of economic sciences adminis ...
winner, to develop a sophisticated statistical model that predicted the price spreads between different securities. The models produced impressive profits until a major debacle that caused the then
Federal Reserve The Federal Reserve System (often shortened to the Federal Reserve, or simply the Fed) is the central banking system of the United States. It was created on December 23, 1913, with the enactment of the Federal Reserve Act, after a series of ...
chairman
Alan Greenspan Alan Greenspan (born March 6, 1926) is an American economist who served as the 13th chairman of the Federal Reserve from 1987 to 2006. He worked as a private adviser and provided consulting for firms through his company, Greenspan Associates L ...
to step in to broker a rescue plan by the
Wall Street Wall Street is a street in the Financial District, Manhattan, Financial District of Lower Manhattan in New York City. It runs eight city blocks between Broadway (Manhattan), Broadway in the west and South Street (Manhattan), South Str ...
broker dealers in order to prevent a meltdown of the bond market.


Possible fundamental limitations of predictive models based on data fitting

History cannot always accurately predict the future. Using relations derived from historical data to predict the future implicitly assumes there are certain lasting conditions or constants in a complex system. This almost always leads to some imprecision when the system involves people. Unknown unknowns are an issue. In all data collection, the collector first defines the set of variables for which data is collected. However, no matter how extensive the collector considers his/her selection of the variables, there is always the possibility of new variables that have not been considered or even defined, yet are critical to the outcome. Algorithms can be defeated adversarially. After an algorithm becomes an accepted standard of measurement, it can be taken advantage of by people who understand the algorithm and have the incentive to fool or manipulate the outcome. This is what happened to the CDO rating described above. The CDO dealers actively fulfilled the rating agencies' input to reach an AAA or super-AAA on the CDO they were issuing, by cleverly manipulating variables that were "unknown" to the rating agencies' "sophisticated" models.


See also

* Calibration (statistics) *
Prediction interval In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval (statistics), interval in which a future observation will fall, with a certain probability, given what has already been observed. Pr ...
*
Predictive analytics Predictive analytics encompasses a variety of Statistics, statistical techniques from data mining, Predictive modelling, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or other ...
*
Predictive inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
*
Statistical learning theory Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on da ...
*
Statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...


References


Further reading

* * * * * {{DEFAULTSORT:Predictive Modelling Statistical classification Statistical models Predictive analytics Business intelligence