Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
pallab_haldar
Active Participant
0 Kudos
5,754

Today I am going to discuss about the different activation functions in Neural Networks and when they need to be used. Before that let's have a look at the definition of Activation functions.

Activation functions : 

The activation function in the neural network is used to activate the calculation of weight and bias from the input node and generate the output based on different conditions i.e. to introduce linearity or non-linearity. Activation functions can be “ON” or “OFF,” depending on the input.

 

pallab_haldar_0-1717778559889.png

 

Non-Linear activation functions : 

A. RELU - When to consider it?

The main purpose of the  activation function is to introduce the property of nonlinearity into the model.  The rectified linear unit (ReLU)  activation function introduces the property of nonlinearity to a deep learning model to solves the vanishing gradients issue and allows for backpropagation while simultaneously making it computationally efficient. The neurons will only be deactivated if the output of the linear transformation is less than 0. used in the hidden layers only

The equation for ReLU function is given below -

pallab_haldar_1-1717782097758.png

pallab_haldar_0-1717781872459.png

The advantages  of using ReLU :

  • This activation function activate  certain numbers of neurons and it is far more computationally efficient when compared to the sigmoid and tanh functions.
  • ReLU accelerates the convergence of gradient descent towards the global minimum of the due to its linear property.

The limitations :

  • The Dying ReLU problem.

To avoid problems related with ReLU there are different updated version of ReLU with below -

1. Leaky ReLU Function: 

Solves the Dying ReLU problem using the standard ReLU that makes the neural network. 

pallab_haldar_0-1718209444443.png

It is based on a ReLU, but it has a small slope for negative values instead of a flat slope. The slope coefficient is determined before training.

2. Parametric ReLU Function:  A Parametric Rectified Linear Unit or PReLU generalizes the traditional rectified unit with a slope for negative values. Due to negative values it is able to  tackles the vanishing gradient problem. A problem with ReLU is that it returns zero for any negative value input.

pallab_haldar_1-1718209734038.png

3.  Exponential Linear Units (ELUs) Function :

The exponential linear units (ELUs) function is used to accelerate the training of neural networks (just like ReLU function). The biggest advantage of the ELU function is that it can eliminate the vanishing gradient problem by using identity for positive values and by improving the learning characteristics of the model.

pallab_haldar_0-1718140808635.png

In this activation function the negative values that push the mean unit activation closer to zero, thereby reducing computational complexity and improving the learning speed up.

tanh works good with RNNs, as RNNs suffers from vanishing gradients.

B. SoftMax Function and when to use :  It is known as the normalized exponential function and  generalization of logistic regression mostly used in multiclass classification problem. It calculates the relative probabilities. That means it uses the value of A1, A2, A3 to determine the final probability value.

Mostly used in mage recognition and Natural Language Processing (NLP) . For instance, in a neural network model predicting types of Flowers, SoftMax would help determine the probability of an image being an Marigold, Rose or Jasmine, Dahlia. This also ensures the sum of these probabilities equals one.

pallab_haldar_0-1718125016461.png

pallab_haldar_2-1718125187327.png

Disadvantages : Suffer from numerical instability due to the use of logarithm function.

C: Swish Activation function and When to use :  Not used to much . But you can get all the details on different  references.

D.  Gaussian Error Linear Unit (GELU) :Not used to much . But you can get all the details on different  references.

E. SELU( Scaled Exponential Linear Unit) : Not used to much . But you can get all the details on different  references.

F. Tanh Function (Hyperbolic Tangent) : 

The tanh function outputs values in the range of -1 to +1. It can deal with negative values more effectively than the sigmoid function, which has a range of 0 to 1. Unlike the sigmoid function, tanh is zero-centered, which means that its output is symmetric around the origin of the coordinate system.

pallab_haldar_0-1718137060825.png

Advantage : The tanh function is extensively used than the sigmoid function since it delivers better training performance for multilayer neural networks. The biggest advantage of the tanh function is that it produces a zero-centered output and supporting the backpropagation process.

Commonly Used: The tanh function has been mostly used  for natural language processing and speech recognition tasks.

However, the tanh function, too, has a limitation – just like the sigmoid function, it cannot solve the vanishing gradient problem. Also, the tanh function can only attain a gradient of 1 when the input value is 0 (x is zero). As a result, the function can produce some dead neurons during the computation process.

G : Sigmoid / Logistic Activation Function : 

Sigmoid function used to take real value as input and outputs values in the range of 0 to 1. 

pallab_haldar_0-1718120653989.png

Sigmoid functions normally work better in the case of classifiers. The sigmoid function mostly used to activate for the output layer of binary classification models.

pallab_haldar_1-1718120944723.png

The main problem of this function is Vanishing gradient problem - gradients that are used to update the network become extremely small or "vanish" when they are backpropagated from the output layers to the earlier layers. It happens outside a certain range value of Sigmoid function.

Hope this will help to identify the proper algorithm to be applied for your scenario.