A complete guide to Activation Functions used in Neural Networks
Artificial Intelligence (AI) is one of the most trending industries in 2018. AI is changing our world forever. If you know about AI then you must have heard about Neural Networks. Neural Networks is one of the most used and popular algorithms in AI. In this article, I will talk about Activation Functions used in Neural Networks. Activation Functions are very important in Neural Networks.

So, How Neural Networks work?
Artificial Neural Networks (ANN) are roughly based on our brains Neural Network. In ANNs, multiple nodes are interconnected together to signals can pass through these nodes. Because of these interconnected nodes, ANNs gives us amazing results. To understand how NNs are, first, assume a 2 layer NN. That means an NN with one input layer, 1 hidden layer, and one output layer.
Note — We don’t count the input layer.
First, we’ve some input as a Vector and, we feed that vector to the network. Then the network performs matrix operations on that input vector to calculates the “weighted sum” of its input, add bias and then finally apply some Activation Function and pass the value to the next layer. And we keep repeating this process until we reach the last layer. This process is known as Forward Propagation.

In the simplest form, to calculate the “weighted sum”, we use the following equation The final out value is the prediction. And we use this prediction to calculate the error by comparing the output with the label. We use the error value to calculate the partial derivative w.r.t the weights and then update the weights with value. We keep repeating this process until the error becomes very small. This process is known as Backward Propagation

This is how a Neural Network works. So now we understand about Neural Networks, so we can jump in the Activation Function.

Note — I’m not going to deep about Forward Propagation and Back Propagation. If you don’t
have any idea about Forward Propagation and Back Propagation, then please learn these
topics first and then follow this post.

So, Why we use Activation Functions?
Why we use Activation Functions? What is the importance of Activation Functions in a Neural Network?
Activation Functions are actually very important for Neural Networks. They add a very important
property in Neural Networks and that is Non-Linearity. Without Activation Functions, Neural
Networks are linear.

But, why Non-Linearity is so important?
A linear function is just a polynomial of one degree, as the function y = x. A graph of the function is given below. As you can see in the graph, the function is forming a straight line. This is a property of Linear Functions. Linear Functions always form straight lines. If we add more dimensions to then function, then it will form planes or hyper planes. Linear Functions are incapable to form any curves.

On the other hand, Non-Linear Functions can form anything including curves. Non-Linear Functions are functions with a polynomial of more than one degree. For example, the function
y= x² or the function y = 2x³. In the graph, the functions y = x² and y = sin(x) are forming a curve, not straight lines. Linear Functions are easy to solve but they cannot learn complex patterns. On the other hand, Non-Linear Functions can learn very complex patterns because they can form complex shapes. And this is very important in Neural Networks.

Also, Linear Functions cause another problem. In Neural Networks, if we add more layers or more nodes in the Neural Network, then we can learn more complex functions. But if we don’t use Activation Functions then adding more layers or adding more nodes cannot help in learning complex functions. This is because, if we add two Linear Functions, the output is still Linear.

At this point, you should understand the importance of Activation Functions and Non-Linear Functions in Neural Network.

Now, lets talk about the types of Activation Functions

There are different types of Activation Functions are out there. We will not discuss all Activation Functions in this post, instead only the Activation Functions that are generally used in Neural Networks.
In this post, we’ll discuss 4 major Activation Functions. And these are
1. Sigmoid Activation Function
2. TanH Activation Function
3. ReLu and
4. Leaky ReLu.
There are more Activation Functions out there, these 4 are the major and most used Activation Functions.

Sigmoid or Logistic Activation Function
Sigmoid is one of the most popular and heavily used Activation Function. Sigmoid is a very simple mathematical function that scales the input between 0 and 1. And that means we can use the Sigmoid Function to detect whether to fire the neuron or not. The mathematical formula is given below. And the graph of the equation is also given below. Although Sigmoid Activation Function is a very popular Activation Function, these days in the era of Deep Learning, we don’t use the Sigmoid Activation Function that much because it suffers from different problems. Here is a list of common problems with Sigmoid Activation Function

• Secondly, its output isn’t zero centered. It makes the gradient updates go too far in
different directions. 0 < output > 1, and it makes optimization harder.
• Sigmoids saturate and kill gradients.
• Sigmoids have slow convergence.

Here is the python 3.x code for Sigmoid Function

import numpy as np
def sigmoid(x, deriv=False):
if deriv:
return x * (1 – x)
return 1 / ( 1 + np.exp(-x))

TanH Activation Function
TanH Activation Function is another very popular Activation Function. It is also widely used in Machine Learning. Unlike Sigmoid, tanH Activation Functions output is always zero centered because its range is between -1 and 1. Hence optimization is easier in this method hence in practice it is always preferred over Sigmoid function. But still, it suffers from Vanishing gradient problem.
The mathematical equation and the graph of that equation is given below.  Even with tanH Activation Function, we still have the problem of Vanishing Gradient problem. Python 3 code for tanH Activation Function

import numpy as np

def tanh(x, deriv=False):
if deriv:
return (1 – (x**2))
return np.tanh(x)

ReLu Activation Function
ReLu Activation Function is a very popular Activation Function, especially with Deep Neural. Many researchers proved that ReLu Activation Function is a better Activation Function than others most of the time. The biggest advantage if ReLu is that it learn faster and avoids Vanishing Gradient problem. ReLu Activation Function is an incredibly simple function. The mathematical function is given below. Here is the graph of ReLU. But ReLu has one limitation. It should only be used in hidden layers of a Neural Network. Another problem with ReLu is that some gradients can be fragile during training and can die. It can cause a weight update which will make it never activate on any data point again. Simply saying that ReLu could result in Dead Neurons.

Python 3 code for ReLu
def ReLU(x):
return x * (x > 0)
def dReLU(x):
return 1. * (x > 0)

Leaky ReLu Activation Function
Leaky ReLu Activation Function is another very popular Activation Function generally used with Deep Neural Networks. It is a modified version of ReLu Activation Function. Leaky ReLu Activation Function solves the
problem of dying neurons.

Leaky ReLu solves the problem of dead neurons by introducing a slight slope that keeps the 