Non-linear Activation Functions for Neural Networks Simplified

Activation function is a what forms an output of a neuron. This is what adds non-linearlity to your prediction and makes a Neural Network based predictor so much better than linear models.

The question we usually ask ourselves is which activation function should I use?

The answer is there is no one-works-all answer to this question. It depends.

Let me walk you through the most commonly used activation functions and their pros and cons to help you make a better decision.

We can define our own activation functions to best fit our need, the most commonly used ones are:

1. Sigmoid Activation

2. Tan hyperbolic Activation

3. ReLU (Rectified Linear Unit)

4. Leaky ReLU

This is how each of them looks like:

Photo Source: DeepLearning.ai Specialization

1. Sigmoid Activation

The sigmoid activation ranges between 0 and 1. It looks like the common "S shaped" curve we see in different fields of studies.

Pros:

Simple - logic and arithmetic-wise

Offers good non-linearity

Natural probablity output - between 0 and 1 for classification problems.

Cons:

Network stops learning when you shoot values towards the extremes of the sigmoid - This is called the problem of vanishing gradients.

2. Tan Hyperbolic

This is pretty much a sigmoid over an extended range (-1 to 1).

Pros:

This increases the steady non-linear range in the middle of the sigmoid before the slope/gradient flattens out. This increased range helps the network learn faster.

Cons:

This activation does limit the problem of vanishing gradients on the tail ends of the sigmoid to a certain grade but we have better options to learn faster.

3. ReLU

Rectified - MAX(0,value)

Linear - for z >0 (positive values)

ReLU is a fancy name for a postive-only linear function. For negative predictions the unit has a slope of 0. However, for the positive activations the network can surely learn a lot faster with a linear slope.

Pros:
Learns faster.

Slope is 1 as long as z is positive.

Cons:

Yet to find any :)

4. Leaky ReLU

This provides a slight slope for negative values of the prediction from the neuron. It's an improvement over ReLU.

These non-linear activations help improve and lay the foundation for Neural Networks.

Until next time!

Ankit-AI - Sharing AI

Search This Blog