There are lots of advancements happening in the world of technology. Artificial intelligence and machine learning are some common ones you might hear frequently. Currently, these technologies are used in almost every field, from marketing, eCommerce, and software development to banking, finance, and medicine. AI and ML are vast fields, and efforts are being made to widen their applications to solve many real-world problems. This is why you could see a lot of branches inside these technologies; ML is a subset of AI itself. Convolutional neural networks are one of the branches of AI becoming popular these days. In this article, I’ll discuss what CNNs are, how they work, and their usefulness in the modern world. Let’s dive right in!
What Is a Convolutional Neural Network?
A convolutional neural network (ConvNet or CNN) is an artificial neural network (ANN) that uses deep learning algorithms to analyze images, classify visuals, and perform computer vision tasks. CNN leverages principles of linear algebra, such as matrix multiplication, for detecting patterns in an image. As these processes involve complex computations, they require graphical processing units (GPUs) for training the models. In simple words, CNN uses Deep Learning algorithms to take input data like images and assign importance in the form of biases and learnable weights to different aspects of that image. This way, CNN can differentiate between images or classify them.
CNN’s: A Brief History
Since a Convolutional neural network is an artificial neural network, it’s important to reiterate neural networks. In computation, a neural network is a part of machine learning (ML) using deep learning algorithms. It’s analogous to the connectivity patterns followed by neurons in the human brain. Artificial neural networks also take inspiration from how the visual cortex is arranged. So, different types of neural networks or artificial neural networks (ANN) are used for different purposes. One among them is CNN used for image detection and classification, and more. It was introduced by a postdoctoral researcher, Yann LeCun, in the 1980s. CNN’s early version – LeNet, named after LeCun, was capable of recognizing handwritten digits. Then, it was used in banking and postal services for reading digits on cheques and zip codes written on envelopes. However, this early version lacked scaling; hence, CNNs were not utilized much in artificial intelligence and computer vision. Also, it required significant computation resources and data to work more efficiently for larger images. Furthermore, in 2012, AlexNet revisited deep learning that utilizes neural networks consisting of multiple layers. Around this time, technology improved, and large data sets and heavy computing resources were available to enable the creation of complex CNNs capable of performing computer vision activities efficiently.
Layers in a CNN
Let’s understand the different layers in a CNN. Increasing layers in a CNN will increase its complexity and enable it to detect more aspects or areas of an image. Starting with a simple feature, it becomes capable of detecting complex features like the object’s shape and larger elements until it can finally detect the image.
Convolutional Layer
The first layer of a CNN is the convolutional layer. It is CNN’s main building block where most of the computations happen. It needs fewer components, such as input data, a feature map, and a filter. A CNN can also have additional convolutional layers. This makes the CNNs structure hierarchical since the subsequent layers can visualize pixels within prior layers’ receptive fields. Next, the convolutional layers transform the given image into numerical values and allow the network to understand and extract valuable patterns.
Pooling Layers
Pooling layers are used to reduce dimensions and are called downsampling. It reduces the parameters used in the input. The pooling operation can move a filter over the complete input like the convolutional layer but lacks weights. Here, the filter applies a joint function to the numerical values in the receptive field to populate the result array. Pooling has two types:
Average pooling: The average value is calculated in the receptive field the filer sweeps over the input to transmit to the output array.Max pooling: It chooses the maximum value pixel and sends it to the output array as the filter sweeps over the input. Max pooling is used more than average pooling.
Although significant data is lost in pooling, it still offers many benefits to CNN. It helps reduce overfitting risks and complexity while improving efficiency. It also enhances CNN’s stability.
Fully Connected (FC) Layer
As the name suggests, all the nodes in an output layer are directly connected to the previous layer’s node in a fully connected layer. It classifies an image based on the extracted features via previous layers along with their filters. Furthermore, FC layers generally use a softmax activation function to classify inputs correctly instead of ReLu functions (as in the case of pooling and convolutional layers). This helps produce a probability of either 0 or 1.
How Do CNNs Work?
A convolutional neural network consists of many layers, even hundreds of them. These layers learn to identify various features of a given image. Although CNN’s are neural networks, their architecture differs from a regular ANN. The latter puts an input through many hidden layers to transform it, where each layer is created with a set of artificial neurons and is fully connected to every neuron in the same layer. At last, there’s a fully-connected layer or the output layer to display the result. On the other hand, CNN organizes the layers in three dimensions – width, depth, and height. Here, a layer from the neuron only connects to neurons in a small region instead of relating to each one of them in the next layer. At last, the final result is represented by a single vector with a probability score and has only the depth dimension. Now, you may ask what “convolution” is in a CNN. Well, convolution refers to a math operation to merge two data sets. In CNN, the convolution concept is applied to input data to output a feature map by filtering the information. This brings us to some of the important concepts and terminologies used in CNNs.
Filter: Also known as a feature detector or kernel, a filter can have a certain dimension, such as 3×3. It goes over an input image to perform matrix multiplication for each element to apply convolution. Applying filters to every training image at varying resolutions plus the output of the convolved image will work as an input for the subsequent layer.
Padding: It’s used to expand an input matrix to the matrix’s borders by inserting fake pixels. It’s done to counter the fact that convolution reduces matrix size. For example, a 9×9 matrix can turn into a 3×3 matrix after filtering.Striding: If you want to get an output smaller than your input, you can perform striding. It allows skipping certain areas while the filter slides over the image. By skipping two or three pixels, you can produce a more efficient network by reducing spatial resolution.Weights and Biases: CNNs have weights and biases in their neurons. A model can learn those values while training, and the values remain the same throughout a given layer for all neurons. This implies that each hidden neuron detects the same features in different areas of an image. As a result, the network becomes more tolerant while translating objects into a given image.ReLU: it stands for Rectified Linear Unit (ReLu) and is used for more effective and faster training. It maps negative values to 0 and maintains positive values. It’s also called activation, as the network carries only the activated image features into the subsequent layer.Receptive field: In a neural network, every neuron receives input from different locations from the previous layer. And in convolutional layers, every neuron receives input from a restricted area only of the prior layer, called a receptive field of the neuron. In the case of the FC layer, the whole previous layer is the receptive field.
In real-world computation tasks, usually, convolution is performed in a 3D image requiring a 3D filter. Coming back to CNN, it comprises different parts or node layers. Each node layer has a threshold and weight and is connected to another. Upon exceeding the threshold limit, data is sent to the next layer in this network. These layers can perform operations to change the data to learn relevant features. Also, these operations repeat hundreds of different layers that keep on learning to detect other features of an image. The parts of a CNN are:
An input layer: This is where the input is taken, such as an image. It will be a 3D object with a defined height, width, and depth.One/multiple hidden layers or feature extraction phase: these layers can be a convolutional layer, pooling layer, and fully connected layer.An output layer: Here, the result will be displayed.
Passing the image through the convolution layer is transformed into a feature map or activation map. After convolving the input, the layers convolve the image and pass the result to the subsequent layer. The CNN will perform many convolutions and pooling techniques to detect the features during the feature extraction phase. For example, if you input a cat’s image, the CNN will recognize its four legs, color, two eyes, etc. Next, fully connected layers in a CNN will act as a classifier over the extracted features. Based on what the deep learning algorithm has predicted about the image, the layers would yield the result.
Advantages of CNNs
Higher Accuracy
CNN’s offer higher accuracy than regular neural networks that don’t use convolution. CNN’s are helpful, especially when the task involves lots of data, video and image recognition, etc. They produce highly precise results and predictions; therefore, their usage is increasing in different sectors.
Computational Efficiency
CNN’s offer a higher computational efficiency level than other regular neural networks. This is because of using the convolution process. They also use dimensionality reduction and parameter sharing to make the models quicker and easier to deploy. These techniques can also be optimized to work on different devices, be it your smartphone or laptop.
Feature Extraction
CNN can easily learn an image’s features without requiring manual engineering. You can leverage pre-trained CNNs and manage the weights by feeding data to them when working on a new task, and the CNN will adapt to it seamlessly.
Applications of CNN
CNN’s are used in different industries for many use cases. Some of the real-life applications of CNNs include:
Image Classification
CNN’s are used widely in image classification. These can recognize valuable features and identify objects in a given image. Hence, it’s used in sectors like healthcare, particularly MRIs. In addition, this technology is used in hand-written digit recognition, which is among the earliest use cases of CNNs in computer vision.
Object Detection
CNN can detect objects in images in real time and also label and classify them. Therefore, this technique is used widely in automated vehicles. It also enables smart homes and pedestrians to recognize the vehicle’s owner’s face. It’s also used in AI-powered surveillance systems to detect and mark objects.
Audiovisual Matching
CNN’s help in audiovisual matching helps improve video streaming platforms such as Netflix, YouTube, etc. It also helps meet user requests such as “love songs by Elton John”.
Speech Recognition
Besides images, CNNs are helpful in natural language processing (NLP) and speech recognition. A real-world example of this could be Google using CNNs in its speech recognition system.
Object Reconstruction
CNNs can be used in the 3D modeling a real object in a digital environment. It’s also possible for CNN models to create a 3D face model using an image. In addition, CNN is useful in constructing digital twins in biotech, manufacturing, biotech, and architecture. CNN’s usage in different sectors includes:
Healthcare: Computer vision can be used in radiology to help doctors to detect cancerous tumors with better efficiency in a person. Agriculture: The networks can utilize images from artificial satellites such as LSAT and leverage this data to classify fertile lands. This also helps predict the land fertility levels and develop an effective strategy to maximize the yield. Marketing: Social media applications can suggest a person in a picture posted on someone’s profile. This helps you tag people in your photo albums. Retail: Ecommerce platforms can use visual search to help brands recommend relevant items that the target customers want to buy. Automotive: CNN finds usage in automobiles to improve passenger and driver safety. It does so with the help of features such as lane line detection, object detection, image classification, etc. This also helps the world of self-driving cars to evolve more.
Resources to Learn CNNs
Coursera:
Coursera has this course on CNN that you can consider taking. This course will teach you how computer vision has evolved over the years and some applications of CNNs in the modern world. You can read these books and lectures to learn more about CNN:
Neural Networks and Deep Learning: It covers models, algorithms, and the theory of deep learning and neural networks.
A Guide to Convolutional Neural Networks for Computer Vision: This book will teach you the applications of CNNs and their concepts.
Hands-on Convolutional Neural Networks with Tensorflow: You can solve various problems in computer vision using Python and TensorFlow with the help of this book.
Advanced Applied Deep Learning: This book will help you understand CNNs, deep learning, and their advanced applications, including object detection.
Convolutional Neural Networks and Recurrent Neural Networks: This book will teach you about CNNs and RNNs and how to build these networks.
Conclusion
Convolutional neural networks are one of the emerging fields of artificial intelligence, machine learning, and deep learning. It has various applications in the present day world in almost every sector. Looking at its increasing usage, it is expected to expand more and be more useful in tackling real-world problems.