In the mathematical field of

analysis Analysis (: analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...

, the Nash–Moser theorem, discovered by

mathematician A mathematician is someone who uses an extensive knowledge of mathematics in their work, typically to solve mathematical problems. Mathematicians are concerned with numbers, data, quantity, mathematical structure, structure, space, Mathematica ...

John Forbes Nash John Forbes Nash Jr. (June 13, 1928 – May 23, 2015), known and published as John Nash, was an American mathematician who made fundamental contributions to game theory, real algebraic geometry, differential geometry, and partial differenti ...

and named for him and Jürgen Moser, is a generalization of the

inverse function theorem In mathematics, the inverse function theorem is a theorem that asserts that, if a real function ''f'' has a continuous derivative near a point where its derivative is nonzero, then, near this point, ''f'' has an inverse function. The inverse fu ...

Banach space In mathematics, more specifically in functional analysis, a Banach space (, ) is a complete normed vector space. Thus, a Banach space is a vector space with a metric that allows the computation of vector length and distance between vectors and ...

s to settings when the required solution mapping for the linearized problem is not bounded. In contrast to the Banach space case, in which the invertibility of the derivative at a point is sufficient for a map to be locally invertible, the Nash–Moser theorem requires the derivative to be invertible in a

neighborhood A neighbourhood (Commonwealth English) or neighborhood (American English) is a geographically localized community within a larger town, city, suburb or rural area, sometimes consisting of a single street and the buildings lining it. Neigh ...

. The theorem is widely used to prove local existence for non-linear

partial differential equations In mathematics, a partial differential equation (PDE) is an equation which involves a multivariable function and one or more of its partial derivatives. The function is often thought of as an "unknown" that solves the equation, similar to how ...

in spaces of

smooth function In mathematical analysis, the smoothness of a function is a property measured by the number of continuous derivatives (''differentiability class)'' it has over its domain. A function of class C^k is a function of smoothness at least ; t ...

s. It is particularly useful when the inverse to the derivative "loses" derivatives, and therefore the Banach space implicit function theorem cannot be used.

History

The Nash–Moser theorem traces back to Nash (1956), who proved the theorem in the special case of the isometric embedding problem. It is clear from his paper that his method can be generalized. Moser (1966), for instance, showed that Nash's methods could be successfully applied to solve problems on

periodic orbit In mathematics, in the study of iterated functions and dynamical systems, a periodic point of a function is a point which the system returns to after a certain number of function iterations or a certain amount of time. Iterated functions Given ...

s in

celestial mechanics Celestial mechanics is the branch of astronomy that deals with the motions of objects in outer space. Historically, celestial mechanics applies principles of physics (classical mechanics) to astronomical objects, such as stars and planets, to ...

in the KAM theory. However, it has proven quite difficult to find a suitable general formulation; there is, to date, no all-encompassing version; various versions due to Gromov,

Hamilton Hamilton may refer to: * Alexander Hamilton (1755/1757–1804), first U.S. Secretary of the Treasury and one of the Founding Fathers of the United States * ''Hamilton'' (musical), a 2015 Broadway musical by Lin-Manuel Miranda ** ''Hamilton'' (al ...

, Hörmander, Saint-Raymond, Schwartz, and Sergeraert are given in the references below. That of Hamilton's, quoted below, is particularly widely cited.

The problem of loss of derivatives

This will be introduced in the original setting of the Nash–Moser theorem, that of the isometric embedding problem. Let

\Omega

be an open subset of Consider the map

P:C^1(\Omega;\mathbb^N)\to C^0\big(\Omega;\text_(\mathbb)\big)

given by

P(f)_=\sum_^N\frac\frac.

In Nash's solution of the isometric embedding problem (as would be expected in the solutions of nonlinear partial differential equations) a major step is a statement of the schematic form "If is such that

P(f)

is positive-definite, then for any matrix-valued function

g

which is close to

P(f)

, there exists

f_

with

P(f_) = g

." Following standard practice, one would expect to apply the Banach space inverse function theorem. So, for instance, one might expect to restrict to

C^(\Omega;\mathbb^N)

and, for an immersion in this domain, to study the linearization given by

\widetilde\mapsto \sum_^N \frac\frac + \sum_^N \frac\frac.

If one could show that this were invertible, with bounded inverse, then the Banach space inverse function theorem directly applies. However, there is a deep reason that such a formulation cannot work. The issue is that there is a second-order differential operator of

P(f)

which coincides with a second-order differential operator applied to . To be precise: if is an immersion then

R^=, H(f), ^2-, h(f), _^2,

where

R^

is the scalar curvature of the Riemannian metric , denotes the mean curvature of the immersion , and denotes its second fundamental form; the above equation is the Gauss equation from surface theory. So, if is , then is generally only . Then, according to the above equation, can generally be only ; if it were then , , , , would have to be at least . The source of the problem can be quite succinctly phrased in the following way: the Gauss equation shows that there is a differential operator such that the order of the composition of with is less than the sum of the orders of and . In context, the upshot is that the inverse to the linearization of , even if it exists as a map , cannot be bounded between appropriate Banach spaces, and hence the Banach space implicit function theorem cannot be applied. By exactly the same reasoning, one cannot directly apply the Banach space implicit function theorem even if one uses the Hölder spaces, the Sobolev spaces, or any of the spaces. In any of these settings, an inverse to the linearization of will fail to be bounded. This is the problem of loss of derivatives. A very naive expectation is that, generally, if is an order differential operator, then if is in then must be in . However, this is somewhat rare. In the case of uniformly elliptic differential operators, the famous Schauder estimates show that this naive expectation is borne out, with the caveat that one must replace the

C^k

spaces with the Hölder spaces

C^

; this causes no extra difficulty whatsoever for the application of the Banach space implicit function theorem. However, the above analysis shows that this naive expectation is ''not'' borne out for the map which sends an immersion to its induced Riemannian metric; given that this map is of order 1, one does not gain the "expected" one derivative upon inverting the operator. The same failure is common in geometric problems, where the action of the diffeomorphism group is the root cause, and in problems of hyperbolic differential equations, where even in the very simplest problems one does not have the naively expected smoothness of a solution. All of these difficulties provide common contexts for applications of the Nash–Moser theorem.

The schematic form of Nash's solution

This section only aims to describe an idea, and as such it is intentionally imprecise. For concreteness, suppose that

P

is an order-one differential operator on some function spaces, so that it defines a map for each

k

. Suppose that, at some function , the linearization has a right inverse ; in the above language this reflects a "loss of one derivative". One can concretely see the failure of trying to use

Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...

to prove the Banach space implicit function theorem in this context: if

g_

is close to

P(f)

C^k

and one defines the iteration

f_ = f_n+S\big(g_\infty-P(f_n)\big),

then

f_1 \in C^

implies that is in

C^k

, and then

f_2

is in

C^k

. By the same reasoning,

f_3

is in

C^

f_4

is in

C^

, and so on. In finitely many steps the iteration must end, since it will lose all regularity and the next step will not even be defined. Nash's solution is quite striking in its simplicity. Suppose that for each

n > \theta

one has a smoothing operator

\theta _n

which takes a

C^k

function, returns a smooth function, and approximates the identity when

n

is large. Then the "smoothed" Newton iteration

f_ = f_n + S\big(\theta_n(g_\infty-P(f_n))\big)

transparently does not encounter the same difficulty as the previous "unsmoothed" version, since it is an iteration in the space of smooth functions which never loses regularity. So one has a well-defined sequence of functions; the major surprise of Nash's approach is that this sequence actually converges to a function

f_

with . For many mathematicians, this is rather surprising, since the "fix" of throwing in a smoothing operator seems too superficial to overcome the deep problem in the standard Newton method. For instance, on this point Mikhael Gromov says Remark. The true "smoothed Newton iteration" is a little more complicated than the above form, although there are a few inequivalent forms, depending on where one chooses to insert the smoothing operators. The primary difference is that one requires invertibility of

DP_f

for an entire open neighborhood of choices of , and then one uses the "true" Newton iteration, corresponding to (using single-variable notation)

x_ = x_n - \frac

as opposed to

x_ = x_n - \frac,

the latter of which reflects the forms given above. This is rather important, since the improved quadratic convergence of the "true" Newton iteration is significantly used to combat the error of "smoothing", in order to obtain convergence. Certain approaches, in particular Nash's and Hamilton's, follow the solution of an ordinary differential equation in function space rather than an iteration in function space; the relation of the latter to the former is essentially that of the solution of

Euler's method In mathematics and computational science, the Euler method (also called the forward Euler method) is a first-order numerical procedure for solving ordinary differential equations (ODEs) with a given initial value. It is the most basic explic ...

to that of a differential equation.

Hamilton's formulation of the theorem

The following statement appears in Hamilton (1982): Similarly, if each linearization is only injective, and a family of left inverses is smooth tame, then ''P'' is locally injective. And if each linearization is only surjective, and a family of right inverses is smooth tame, then ''P'' is locally surjective with a smooth tame right inverse.

Tame Fréchet spaces

A consists of the following data: * a vector space

F

* a countable collection of seminorms

\, \,\cdot\,\, _n : F \to \R

such that

\, f\, _0 \leq \, f\, _1 \leq \, f\, _2 \leq \cdots

for all

f\in F.

One requires these to satisfy the following conditions: ** if

f \in F

is such that

\, f\, _n = 0

for all

n = 0, 1, 2, \ldots

then

f = 0

** if

f_j \in F

is a sequence such that, for each

n = 0,1,2,\ldots

and every

\varepsilon > 0

there exists

N_

such that

j, k > N_

implies

\, f_j - f_k\, _n < \varepsilon,

then there exists

f\in F

such that, for each

n,

one has

\lim_ \, f_j - f\, _n = 0.

Such a graded Fréchet space is called a if it satisfies the following condition: * there exists a Banach space

B

and linear maps

L : F \to \Sigma(B)

and

M : \Sigma(B) \to F

such that

M \circ L: F \to F

is the identity map and such that: ** there exists

r

and

b

such that for each

n > b

there is a number

C_n

such that

\sup_ e^\, L(f)_k\, _B \leq C_n\, f\, _

for every

f \in F,

and

\, M(\)\, _n \leq C_n\sup_ e^ \, x_k\, _B

for every

\left\ \in \Sigma(B).

Here

\Sigma(B)

denotes the vector space of exponentially decreasing sequences in

B,

that is,

\Sigma(B) = \Big\.

The laboriousness of the definition is justified by the primary examples of tamely graded Fréchet spaces: * If

M

is a compact smooth manifold (with or without boundary) then

C^(M)

is a tamely graded Fréchet space, when given any of the following graded structures: ** take

\, f\, _n

to be the

C^n

-norm of ** take

\, f\, _n

to be the

C^

-norm of for fixed

\alpha

** take

\, f\, _n

to be the

W^

-norm of for fixed

p

* If

M

is a compact smooth manifold-with-boundary then

C_0^(M),

the space of smooth functions whose derivatives all vanish on the boundary, is a tamely graded Fréchet space, with any of the above graded structures. * If

M

is a compact smooth manifold and

V \to M

is a smooth vector bundle, then the space of smooth sections is tame, with any of the above graded structures. To recognize the tame structure of these examples, one topologically embeds

M

in a Euclidean space,

B

is taken to be the space of

L^1

functions on this Euclidean space, and the map

L

is defined by dyadic restriction of the Fourier transform. The details are in pages 133-140 of Hamilton (1982). Presented directly as above, the meaning and naturality of the "tame" condition is rather obscure. The situation is clarified if one re-considers the basic examples given above, in which the relevant "exponentially decreasing" sequences in Banach spaces arise from restriction of a Fourier transform. Recall that smoothness of a function on Euclidean space is directly related to the rate of decay of its Fourier transform. "Tameness" is thus seen as a condition which allows an abstraction of the idea of a "smoothing operator" on a function space. Given a Banach space

B

and the corresponding space

\Sigma(B)

of exponentially decreasing sequences in

B,

the precise analogue of a smoothing operator can be defined in the following way. Let

s : \R \to \R

be a smooth function which vanishes on

(-\infty, 0),

is identically equal to one on

(1, \infty),

and takes values only in the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

Then for each real number

t

define

\theta_t : \Sigma(B) \to \Sigma(B)

\left(\theta_tx\right)_i = s(t-i) x_i.

If one accepts the schematic idea of the proof devised by Nash, and in particular his use of smoothing operators, the "tame" condition then becomes rather reasonable.

Smooth tame maps

Let and

G

be graded Fréchet spaces. Let

U

be an open subset of , meaning that for each

f \in U

there are

n\in\N

and

\varepsilon>0

such that

\, f - f_1\,  < \varepsilon

implies that

f_1

is also contained in

U

. A smooth map

P : U \to G

is called a if for all

k \in \N

the derivative

D^kP:U \times F \times \cdots \times F \to G

satisfies the following: The fundamental example says that, on a compact smooth manifold, a nonlinear partial differential operator (possibly between sections of vector bundles over the manifold) is a smooth tame map; in this case,

r

can be taken to be the order of the operator.

Proof of the theorem

Let

S

denote the family of inverse mappings

U \times G \to F.

Consider the special case that

F

and

G

are spaces of exponentially decreasing sequences in Banach spaces, i.e.

F=\Sigma(B)

and

G=\Sigma(C)

. (It is not too difficult to see that this is sufficient to prove the general case.) For a positive number , consider the ordinary differential equation in

\Sigma(B)

given by

f'=c S\Big(\theta_t(f),\theta_t\big(g_\infty-P(f)\big)\Big).

Hamilton shows that if

P(0) = 0

and

g_

is sufficiently small in

\Sigma(C)

, then the solution of this differential equation with initial condition

f(0) = 0

exists as a mapping , and that converges as

t \to \infty

to a solution of

P(f) = g_

References

Bibliography

* * * * * * * * * {{DEFAULTSORT:Nash-Moser theorem Differential equations Topological vector spaces Inverse functions Theorems in functional analysis