In
association football
Association football, more commonly known as football or soccer, is a team sport played between two teams of 11 players who primarily use their feet to propel the ball around a rectangular field called a pitch. The objective of the game is t ...
, expected goals (xG) is a
performance metric
A performance indicator or key performance indicator (KPI) is a type of performance measurement. KPIs evaluate the success of an organization or of a particular activity (such as projects, programs, products and other initiatives) in which it en ...
used to evaluate
football team
A football team is a group of players selected to play together in the various team sports known as football. Such teams could be selected to play in a match against an opposing team, to represent a football club, group, state or nation, an all-s ...
and
player
Player may refer to:
Role or adjective
* Player (game), a participant in a game or sport
** Gamer, a player in video and tabletop games
** Athlete, a player in sports
** Player character, a character in a video game or role playing game who is ...
performance. It can be used to represent the probability of a scoring opportunity that may result in a
goal
A goal is an idea of the future or desired result that a person or a group of people envision, plan and commit to achieve. People endeavour to reach goals within a finite time by setting deadlines.
A goal is roughly similar to a purpose or ...
. It is also used in
ice hockey
Ice hockey (or simply hockey) is a team sport played on ice skates, usually on an Ice rink, ice skating rink with Ice hockey rink, lines and markings specific to the sport. It belongs to a family of sports called hockey. In ice hockey, two o ...
.
Metric
Association football
There is some debate about the origin of the term ''expected goals''. Vic Barnett and his colleague Sarah Hilditch referred to "expected goals" in their 1993 paper that investigated the effects of
artificial pitch (AP) surfaces on home team performance in association football in England. Their paper included this observation:
Quantitatively we find for the AP group about 0.15 more goals per home match than expected and, allowing for the lower than expected goals against in home matches, an excess goal difference (for home matches) of about 0.31 goals per home match. Over a season this yields about 3 more goals for, an improved goal difference of about 6 goals.
Jake Ensum, Richard Pollard and Samuel Taylor (2004)
reported their study of data from 37 matches in the
2002 World Cup
The 2002 FIFA World Cup, also branded as Korea Japan 2002, was the 17th FIFA World Cup, the quadrennial football world championship for men's national teams organized by FIFA. It was held from 31 May to 30 June 2002 at sites in South Korea ...
in which 930 shots and 93 goals were recorded. Their research sought "to investigate and quantify 12 factors that might affect the success of a shot". Their logistic regression identified five factors that had a significant effect on determining the success of a kicked shot: distance from the goal; angle from the goal; whether or not the player taking the shot was at least 1 m away from the nearest defender; whether or not the shot was immediately preceded by a cross; and the number of outfield players between the shot-taker and goal.
They concluded "the calculation of shot
probabilities
Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speakin ...
allows a greater depth of analysis of shooting opportunities in comparison to recording only the number of shots".
In a subsequent paper (2004), Ensum, Pollard and Taylor combined data from the
1986
The year 1986 was designated as the International Year of Peace by the United Nations.
Events January
* January 1
**Aruba gains increased autonomy from the Netherlands by separating from the Netherlands Antilles.
**Spain and Portugal enter ...
and 2002 World Cup competitions to identify three significant factors that determined the success of a kicked shot: distance from the goal; angle from the goal; and whether or not the player taking the shot was at least 1 m away from the nearest defender.
Howard Hamilton (2009) proposed "a useful statistic in soccer" that "will ultimately contribute to what I call an 'expected goal value' — for any action on the field in the course of a game, the probability that said action will create a goal".
Sander Itjsma (2011)
discussed "a method to assign different value to different chances created during a football match" and in doing so concluded:
we now have a system in place in order to estimate the overall value of the chances created by either team during the match. Knowing how many goals a team is expected to score from its chances is of much more value than just knowing how many attempts to score a goal were made. Other applications of this method of evaluation would be to distinguish a lack of quality attempts created from a finishing problem or to evaluate defensive and goalkeeping performances. And a third option would be to plot the balance of play during the match in terms of the quality of chances created in order to graphically represent how the balance of play evolved during the match.
Sarah Rudd (2011) discussed probable goal scoring patterns (P(Goal)) in her use of
Markov Chains
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
for tactical analysis (including the proximity of defenders) from 123 games in the
2010-2011 English Premier League season. I
a video presentationof her paper at the 2011 New England Symposium of Statistics in Sport, Rudd reported her use of analysis methods to compare "expected goals" with actual goals and her process of applying weightings to incremental actions for P(goal) outcomes.
In April 2012, Sam Green
wrote about 'expected goals' in his assessment of
Premier League
The Premier League (legal name: The Football Association Premier League Limited) is the highest level of the men's English football league system. Contested by 20 clubs, it operates on a system of promotion and relegation with the English Foo ...
goalscorers. He asked "So how do we quantify which areas of the pitch are the most likely to result in a goal and therefore, which shots have the highest probability of resulting in a goal?". He added:
If we can establish this metric, we can then accurately and effectively increase our chances of scoring and therefore winning matches. Similarly, we can use this data from a defensive perspective to limit the better chances by defending key areas of the pitch.
Green proposed a model to determine "a shot's probability of being on target and/or scored". With this model "we can look at each player's shots and tally up the probability of each of them being a goal to give an expected goal (xG) value".
Ice hockey
In 2004, Alan Ryder shared a methodology for the study of the quality of an
ice hockey
Ice hockey (or simply hockey) is a team sport played on ice skates, usually on an Ice rink, ice skating rink with Ice hockey rink, lines and markings specific to the sport. It belongs to a family of sports called hockey. In ice hockey, two o ...
shot on goal. His discussion started with this sentence “Not all shots on goal are created equal”.
Ryder's model for the measurement of shot quality was:
* Collect the data and analyze goal
A goal is an idea of the future or desired result that a person or a group of people envision, plan and commit to achieve. People endeavour to reach goals within a finite time by setting deadlines.
A goal is roughly similar to a purpose or ...
probabilities for each shooting circumstance
* Build a model of goal probabilities that relies on the measured circumstance
* For each shot, determine its goal probability
* Expected Goals: EG = the sum of the goal probabilities for each shot
* Neutralize the variation in shots on goal by calculating Normalized Expected Goals
* Shot Quality Against
Ryder concluded:
The model to get to expected goals given the shot quality factors is simply based on the
data. There are no meaningful assumptions made. The analytic methods are the classics
from statistics and actuarial science. The results are therefore very credible.
In 2007,
Ryder issued a product recall notice for his shot quality model. He presented “a cautionary note on the calculation of shot quality” and pointed to “data quality problems with the measurement of the quality of a hockey team’s shots taken and allowed”.
He reported:
I have been worried that there is a systemic bias in the data. Random errors don’t concern me. They even out over large volumes of data. But I do think that ... the scoring in certain rinks has a bias towards longer or shorter shots, the most dominant factor in a shot quality model. And I set out to investigate that possibility.
The term 'expected goals' appeared in a paper about ice hockey performance presented by Brian Macdonald
at the
MIT Sloan Sports Analytics Conference in 2012. Macdonald's method for calculating expected goals was reported in the paper:
We used data from the last four full NHL
The National Hockey League (NHL; french: Ligue nationale de hockey—LNH, ) is a professional ice hockey league in North America comprising 32 teams—25 in the United States and 7 in Canada. It is considered to be the top ranked professional ...
seasons. For each team, the season was split into two halves. Since midseason trades and injuries can have an impact on a team’s performance, we did not use statistics from the first half of the season to predict goals in the second half. Instead, we split the season into odd and even games, and used statistics from odd games to predict goals in even games. Data from 2007-08, 2008-09, and 2009-10 was used as the training data to estimate the parameters in the model, and data from the entire 2010-11 was set aside for validating the model. The model was also validated using 10-fold cross-validation. Mean squared error (MSE) of actual goals and predicted goals was our choice for measuring the performance of our models.
References
{{Reflist
External links
WikiEducator pagepresents a chronology for the discussions of expected goals in association football literature from 2013 to 2018
Association football terminology
Ice hockey statistics