In
neural networks
A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
, a pooling layer is a kind of
network layer
In the seven-layer OSI model of computer networking, the network layer is layer 3. The network layer is responsible for packet forwarding including routing through intermediate routers.
Functions
The network layer provides the means of trans ...
that
downsamples and aggregates information that is dispersed among many vectors into fewer vectors. It has several uses. It removes redundant information, reducing the amount of computation and memory required, makes the model more robust to small variations in the input, and increases the receptive field of neurons in later layers in the network.
Convolutional neural network pooling
Pooling is most commonly used in
convolutional neural network
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...
s (CNN). Below is a description of pooling in 2-dimensional CNNs. The generalization to n-dimensions is immediate.
As notation, we consider a tensor
, where
is height,
is width, and
is the number of channels. A pooling layer outputs a tensor
.
We define two variables
called "filter size" (aka "kernel size") and "stride". Sometimes, it is necessary to use a different filter size and stride for horizontal and vertical directions. In such cases, we define 4 variables
.
The receptive field of an entry in the output tensor
are all the entries in
that can affect that entry.
Max pooling
Max Pooling (MaxPool) is commonly used in CNNs to reduce the spatial dimensions of feature maps.
Define
where
means the range
. Note that we need to avoid the
off-by-one error
An off-by-one error or off-by-one bug (known by acronyms OBOE, OBO, OB1 and OBOB) is a logic error involving the discrete equivalent of a boundary condition. It often occurs in computer programming when an iterative loop iterates one time too m ...
. The next input is
and so on. The receptive field of
is
, so in general,
If the horizontal and vertical filter size and strides differ, then in general,
More succinctly, we can write
.
If
is not expressible as
where
is an integer, then for computing the entries of the output tensor on the boundaries, max pooling would attempt to take as inputs variables off the tensor. In this case, how those non-existent variables are handled depends on the padding conditions, illustrated on the right.
Global Max Pooling (GMP) is a specific kind of max pooling where the output tensor has shape
and the receptive field of
is all of
. That is, it takes the maximum over each entire channel. It is often used just before the final fully connected layers in a CNN classification head.
Average pooling
Average pooling (AvgPool) is similarly defined
Global Average Pooling (GAP) is defined similarly to GMP. It was first proposed in Network-in-Network. Similarly to GMP, it is often used just before the final fully connected layers in a CNN classification head.
Interpolations
There are some interpolations of max pooling and average pooling.
Mixed Pooling is a linear sum of maxpooling and average pooling. That is,
where