HOME

TheInfoList



OR:

Curve fitting is the process of constructing a
curve In mathematics, a curve (also called a curved line in older texts) is an object similar to a line, but that does not have to be straight. Intuitively, a curve may be thought of as the trace left by a moving point. This is the definition that ...
, or
mathematical function In mathematics, a function from a set (mathematics), set to a set assigns to each element of exactly one element of .; the words ''map'', ''mapping'', ''transformation'', ''correspondence'', and ''operator'' are sometimes used synonymously. ...
, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either
interpolation In the mathematics, mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points. In engineering and science, one ...
, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data. A related topic is regression analysis, which focuses more on questions of
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
such as how much uncertainty is present in a curve that is fitted to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available, and to summarize the relationships among two or more variables. Extrapolation refers to the use of a fitted curve beyond the range of the observed data, and is subject to a degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data. For linear-algebraic analysis of data, "fitting" usually means trying to find the curve that minimizes the vertical (''y''-axis) displacement of a point from the curve (e.g., ordinary least squares). However, for graphical and image applications, geometric fitting seeks to provide the best visual fit; which usually means trying to minimize the orthogonal distance to the curve (e.g., total least squares), or to otherwise include both axes of displacement of a point from the curve. Geometric fits are not popular because they usually require non-linear and/or iterative calculations, although they have the advantage of a more aesthetic and geometrically accurate result.


Algebraic fitting of functions to data points

Most commonly, one fits a function of the form .


Fitting lines and polynomial functions to data points

The first degree
polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
equation :y = ax + b\; is a line with
slope In mathematics, the slope or gradient of a Line (mathematics), line is a number that describes the direction (geometry), direction of the line on a plane (geometry), plane. Often denoted by the letter ''m'', slope is calculated as the ratio of t ...
''a''. A line will connect any two points, so a first degree polynomial equation is an exact fit through any two points with distinct x coordinates. If the order of the equation is increased to a second degree polynomial, the following results: :y = ax^2 + bx + c\;. This will exactly fit a simple curve to three points. If the order of the equation is increased to a third degree polynomial, the following is obtained: :y = ax^3 + bx^2 + cx + d\;. This will exactly fit four points. A more general statement would be to say it will exactly fit four constraints. Each constraint can be a point,
angle In Euclidean geometry, an angle can refer to a number of concepts relating to the intersection of two straight Line (geometry), lines at a Point (geometry), point. Formally, an angle is a figure lying in a Euclidean plane, plane formed by two R ...
, or
curvature In mathematics, curvature is any of several strongly related concepts in geometry that intuitively measure the amount by which a curve deviates from being a straight line or by which a surface deviates from being a plane. If a curve or su ...
(which is the reciprocal of the radius of an osculating circle). Angle and curvature constraints are most often added to the ends of a curve, and in such cases are called end conditions. Identical end conditions are frequently used to ensure a smooth transition between polynomial curves contained within a single spline. Higher-order constraints, such as "the change in the rate of curvature", could also be added. This, for example, would be useful in highway cloverleaf design to understand the rate of change of the forces applied to a car (see jerk), as it follows the cloverleaf, and to set reasonable speed limits, accordingly. The first degree polynomial equation could also be an exact fit for a single point and an angle while the third degree polynomial equation could also be an exact fit for two points, an angle constraint, and a curvature constraint. Many other combinations of constraints are possible for these and for higher order polynomial equations. If there are more than ''n'' + 1 constraints (''n'' being the degree of the polynomial), the polynomial curve can still be run through those constraints. An exact fit to all constraints is not certain (but might happen, for example, in the case of a first degree polynomial exactly fitting three collinear points). In general, however, some method is then needed to evaluate each approximation. The least squares method is one way to compare the deviations. There are several reasons given to get an approximate fit when it is possible to simply increase the degree of the polynomial equation and get an exact match.: * Even if an exact match exists, it does not necessarily follow that it can be readily discovered. Depending on the algorithm used there may be a divergent case, where the exact fit cannot be calculated, or it might take too much computer time to find the solution. This situation might require an approximate solution. * The effect of averaging out questionable data points in a sample, rather than distorting the curve to fit them exactly, may be desirable. * Runge's phenomenon: high order polynomials can be highly oscillatory. If a curve runs through two points ''A'' and ''B'', it would be expected that the curve would run somewhat near the midpoint of ''A'' and ''B'', as well. This may not happen with high-order polynomial curves; they may even have values that are very large in positive or negative magnitude. With low-order polynomials, the curve is more likely to fall near the midpoint (it's even guaranteed to exactly run through the midpoint on a first degree polynomial). * Low-order polynomials tend to be smooth and high order polynomial curves tend to be "lumpy". To define this more precisely, the maximum number of
inflection point In differential calculus and differential geometry, an inflection point, point of inflection, flex, or inflection (rarely inflexion) is a point on a smooth plane curve at which the curvature changes sign. In particular, in the case of the graph ...
s possible in a polynomial curve is ''n-2'', where ''n'' is the order of the polynomial equation. An inflection point is a location on the curve where it switches from a positive radius to negative. We can also say this is where it transitions from "holding water" to "shedding water". Note that it is only "possible" that high order polynomials will be lumpy; they could also be smooth, but there is no guarantee of this, unlike with low order polynomial curves. A fifteenth degree polynomial could have, at most, thirteen inflection points, but could also have eleven, or nine or any odd number down to one. (Polynomials with even numbered degree could have any even number of inflection points from ''n'' - 2 down to zero.) The degree of the polynomial curve being higher than needed for an exact fit is undesirable for all the reasons listed previously for high order polynomials, but also leads to a case where there are an infinite number of solutions. For example, a first degree polynomial (a line) constrained by only a single point, instead of the usual two, would give an infinite number of solutions. This brings up the problem of how to compare and choose just one solution, which can be a problem for both software and humans. Because of this, it is usually best to choose as low a degree as possible for an exact match on all constraints, and perhaps an even lower degree, if an approximate fit is acceptable.


Fitting other functions to data points

Other types of curves, such as
trigonometric functions In mathematics, the trigonometric functions (also called circular functions, angle functions or goniometric functions) are real functions which relate an angle of a right-angled triangle to ratios of two side lengths. They are widely used in all ...
(such as sine and cosine), may also be used, in certain cases. In spectroscopy, data may be fitted with Gaussian, Lorentzian, Voigt and related functions. In biology, ecology, demography, epidemiology, and many other disciplines, the growth of a population, the spread of infectious disease, etc. can be fitted using the
logistic function A logistic function or logistic curve is a common S-shaped curve ( sigmoid curve) with the equation f(x) = \frac where The logistic function has domain the real numbers, the limit as x \to -\infty is 0, and the limit as x \to +\infty is L. ...
. In
agriculture Agriculture encompasses crop and livestock production, aquaculture, and forestry for food and non-food products. Agriculture was a key factor in the rise of sedentary human civilization, whereby farming of domesticated species created ...
the inverted logistic
sigmoid function A sigmoid function is any mathematical function whose graph of a function, graph has a characteristic S-shaped or sigmoid curve. A common example of a sigmoid function is the logistic function, which is defined by the formula :\sigma(x ...
(S-curve) is used to describe the relation between crop yield and growth factors. The blue figure was made by a sigmoid regression of data measured in farm lands. It can be seen that initially, i.e. at low soil salinity, the crop yield reduces slowly at increasing soil salinity, while thereafter the decrease progresses faster.


Geometric fitting of plane curves to data points

If a function of the form y=f(x) cannot be postulated, one can still try to fit a plane curve. Other types of curves, such as
conic sections A conic section, conic or a quadratic curve is a curve obtained from a Conical surface, cone's surface intersecting a plane (mathematics), plane. The three types of conic section are the hyperbola, the parabola, and the ellipse; the circle is ...
(circular, elliptical, parabolic, and hyperbolic arcs) or
trigonometric functions In mathematics, the trigonometric functions (also called circular functions, angle functions or goniometric functions) are real functions which relate an angle of a right-angled triangle to ratios of two side lengths. They are widely used in all ...
(such as sine and cosine), may also be used, in certain cases. For example, trajectories of objects under the influence of gravity follow a parabolic path, when air resistance is ignored. Hence, matching trajectory data points to a parabolic curve would make sense. Tides follow sinusoidal patterns, hence tidal data points should be matched to a sine wave, or the sum of two sine waves of different periods, if the effects of the Moon and Sun are both considered. For a parametric curve, it is effective to fit each of its coordinates as a separate function of
arc length Arc length is the distance between two points along a section of a curve. Development of a formulation of arc length suitable for applications to mathematics and the sciences is a problem in vector calculus and in differential geometry. In the ...
; assuming that data points can be ordered, the chord distance may be used.


Fitting a circle by geometric fit

Coope approaches the problem of trying to find the best visual fit of circle to a set of 2D data points. The method elegantly transforms the ordinarily non-linear problem into a linear problem that can be solved without using iterative numerical methods, and is hence much faster than previous techniques.


Fitting an ellipse by geometric fit

The above technique is extended to general ellipsesPaul Sheer
A software assistant for manual stereo photometrology
M.Sc. thesis, 1997
by adding a non-linear step, resulting in a method that is fast, yet finds visually pleasing ellipses of arbitrary orientation and displacement.


Fitting surfaces

Note that while this discussion was in terms of 2D curves, much of this logic also extends to 3D surfaces, each patch of which is defined by a net of curves in two parametric directions, typically called u and v. A surface may be composed of one or more surface patches in each direction.


Software

Many statistical packages such as R and numerical software such as the gnuplot,
GNU Scientific Library The GNU Scientific Library (or GSL) is a software library for numerical computations in applied mathematics and science. The GSL is written in C (programming language), C; wrappers are available for other programming languages. The GSL is part of ...
, Igor Pro, MLAB,
Maple ''Acer'' is a genus of trees and shrubs commonly known as maples. The genus is placed in the soapberry family Sapindaceae.Stevens, P. F. (2001 onwards). Angiosperm Phylogeny Website. Version 9, June 2008 nd more or less continuously updated si ...
,
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
, TK Solver 6.0,
Scilab Scilab is a free and open-source, cross-platform numerical computational package and a high-level, numerically oriented programming language. It can be used for signal processing, statistical analysis, image enhancement, fluid dynamics simul ...
, Mathematica,
GNU Octave GNU Octave is a scientific programming language for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly ...
, and SciPy include commands for doing curve fitting in a variety of scenarios. There are also programs specifically written to do curve fitting; they can be found in the lists of statistical and numerical-analysis programs as well as in :Regression and curve fitting software.


See also

*
Calibration curve In analytical chemistry, a calibration curve, also known as a standard curve, is a general method for determining the concentration of a substance in an unknown sample by comparing the unknown to a set of standard samples of known concentration. ...
* Curve-fitting compaction *
Discretization In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numeri ...
*
Estimation theory Estimation theory is a branch of statistics that deals with estimating the values of Statistical parameter, parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such ...
*
Function approximation In general, a function approximation problem asks us to select a function (mathematics), function among a that closely matches ("approximates") a in a task-specific way. The need for function approximations arises in many branches of applied ...
*
Genetic programming Genetic programming (GP) is an evolutionary algorithm, an artificial intelligence technique mimicking natural evolution, which operates on a population of programs. It applies the genetic operators selection (evolutionary algorithm), selection a ...
*
Goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measur ...
* Least-squares adjustment * Levenberg–Marquardt algorithm * Line fitting *
Linear interpolation In mathematics, linear interpolation is a method of curve fitting using linear polynomials to construct new data points within the range of a discrete set of known data points. Linear interpolation between two known points If the two known po ...
* Linear trend estimation *
Mathematical model A mathematical model is an abstract and concrete, abstract description of a concrete system using mathematics, mathematical concepts and language of mathematics, language. The process of developing a mathematical model is termed ''mathematical m ...
* Multi expression programming *
Multi-curve framework In finance, an interest rate swap (finance), swap (IRS) is an interest rate derivative, interest rate derivative (IRD). It involves exchange of interest rates between two parties. In particular it is a Interest rate derivative#Linear and non-linear ...
and
Bootstrapping (finance) In finance, bootstrapping is a method for constructing a ( zero-coupon) fixed-income yield curve from the prices of a set of coupon-bearing products, e.g. bonds and swaps. A ''bootstrapped curve'', correspondingly, is one where the prices of the ...
* Nonlinear regression *
Overfitting In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfi ...
* Plane curve * Probability distribution fitting * Progressive-iterative approximation method * Sinusoidal model * Smoothing * Splines ( interpolating, smoothing) *
Time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
* Total least squares


References


Further reading

*N. Chernov (2010), ''Circular and linear regression: Fitting circles and lines by least squares'', Chapman & Hall/CRC, Monographs on Statistics and Applied Probability, Volume 117 (256 pp.)

{{Authority control Curve fitting,