So, let’s talk about our brain first
In our life, we are constantly analyzing the environment around us, making predictions about the world, recognizing different objects, faces etc. To understand this, look at this picture.
This is what we do all the day, with very high accuracy.
Ever wondered how our brain does that? What “algorithms” are used by our brain to do these analyses?
I don’t want to talk about the functioning of our brain in super depth. But in simple words, our brain uses Neurons to achieve that. In a normal human brain, there are more than 100 billion neurons.
But what are these Neurons?
Neuron (or nerve cell) is an electrically excitable cell that receives, processes, and transmits information through electrical and chemical signals. In simple words, they are like switches that turn on or off based on certain conditions. Here is an illustration of Neuron.
As we know, we have more than 100 billion Neurons on our brain. These all Neurons are connected together and form a network known as Neural Circuit.
These Neural Circuits try to find different patterns on an image and based on those patterns, our brain decides and predicts.
For example when we see that image, first we try to find different patterns. And based on different patterns, we predict about that happy girl.
Now let’s talk about Convolutional Neural Networks
To understanding Convolutional Neural Networks, first, we need to understand how computers process images
So, How computers ‘see’ an image?
Computers ‘see’ an image differently. For computers an image just a matrix of numbers where each number representing the color intensity of that particular pixel.
So for computers, images are just a massive matrix of numbers. Understanding these images are very hard.
Computers perceive images in a different way, doesn’t mean we can’t train them to recognize patterns as we do. We just have to think of what an image is in a different way.
Convolutional Neural Networks, a type of Artificial Neural Networks are great at great at understanding images.
Convolutional Neural Networks are one of the most popular models in the field of Machine Learning and Deep Learning. Convolutional Neural Networks allows computers to see, in other words, they are used in Computer Vision.
Convolutional Neural Networks works similarly as our brain works. First, we train these models on thousands of images and then they generalize the inputs and make predictions for images it has never seen before.
Architecture of CNN
The architecture of CNNs is one of the most important topic in CNN. A complete CNN consists of multiple layers, where each is doing something specific. Without understanding the purpose of each layer in CNN, we can not make a perfect CNN.
So, let’s learn about these layers.
Convolution Layer is the first layer in a CNN. Its main work is to extract features from an input image. Convolution Layer preserves the relationship between pixels by learning image features using small squares of input data. It is a mathematical operation that takes two inputs such as image matrix and a filter or kernel. In this operation, first we multiple the image matrix with the filter matrix. Consider the GIF image for better understanding.
Convolution of an image with different filters can perform different operations such as edge detection, blurring etc.
Strides are the number of pixels shifts over the image. When stride is one, that means we are moving the filter one at a time.
Sometimes, the feature map is not perfectly fit in the input image. In such cases, we have two options to do.
1.Dropping the part of the image
2.Padding the image with zeros
Non Linearity (ReLu)
Non-Linearity is another very important step in CNN. ReLu stands for Rectified Linear Logical Unit. This function introduces non-linearity in the images.
Mathematically the output of ReLU is ƒ(x) = max(0,x).
The Pooling Layer make CNN is very powerful and effective. Pooling is a vector to scalar transformation that operates on each local region of an image, just like convolutions do, however, unlike convolutions, they do not have filters and do not compute dot products with the local region, instead they compute the average of the pixels in the region (Average Pooling) or simply picks the pixel with the highest intensity and discards the rest (Max Pooling).
Pooling layer accepts the output of ReLu layer and applies pooling operation. The main function of max pooling operation is to reduce the size of images.
Fully Connected Layer
Fully Connected of Dense Layer is the last layer of CNN. In this layer first, we flattened our matrix into a vector and then feed it into a Fully Connected Neural Network and then it generates the main result.