In
neural networks
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...
, a pooling layer is a kind of
network layer
In the seven-layer OSI model of computer networking, the network layer is layer 3. The network layer is responsible for packet forwarding including routing through intermediate Router (computing), routers.
Functions
The network layer provides t ...
that
downsamples and aggregates information that is dispersed among many vectors into fewer vectors. It has several uses. It removes redundant information, reducing the amount of computation and memory required, makes the model more robust to small variations in the input, and increases the receptive field of neurons in later layers in the network.
Convolutional neural network pooling
Pooling is most commonly used in
convolutional neural network
A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different ty ...
s (CNN). Below is a description of pooling in 2-dimensional CNNs. The generalization to n-dimensions is immediate.
As notation, we consider a tensor
, where
is height,
is width, and
is the number of channels. A pooling layer outputs a tensor
.
We define two variables
called "filter size" (aka "kernel size") and "stride". Sometimes, it is necessary to use a different filter size and stride for horizontal and vertical directions. In such cases, we define 4 variables
.
The receptive field of an entry in the output tensor
are all the entries in
that can affect that entry.
Max pooling
Max Pooling (MaxPool) is commonly used in CNNs to reduce the spatial dimensions of feature maps.
Define
where
means the range
. Note that we need to avoid the
off-by-one error
An off-by-one error or off-by-one bug (known by acronyms OBOE, OBOB, OBO and OB1) is a logic error that involves a number that differs from its intended value by 1. An off-by-one error can sometimes appear in a mathematics, mathematical context. ...
. The next input is
and so on. The receptive field of
is
, so in general,
If the horizontal and vertical filter size and strides differ, then in general,
More succinctly, we can write
.
If
is not expressible as
where
is an integer, then for computing the entries of the output tensor on the boundaries, max pooling would attempt to take as inputs variables off the tensor. In this case, how those non-existent variables are handled depends on the padding conditions, illustrated on the right.
Global Max Pooling (GMP) is a specific kind of max pooling where the output tensor has shape
and the receptive field of
is all of
. That is, it takes the maximum over each entire channel. It is often used just before the final fully connected layers in a CNN classification head.
Average pooling
Average pooling (AvgPool) is similarly defined
Global Average Pooling (GAP) is defined similarly to GMP. It was first proposed in Network-in-Network. Similarly to GMP, it is often used just before the final fully connected layers in a CNN classification head.
Interpolations
There are some interpolations of max pooling and average pooling.
Mixed Pooling is a linear sum of maxpooling and average pooling. That is,
where