Those who know the basics of Artificial Intelligence (AI) know how fundamental data is, for training in Machine Learning or Deep Learning. However, most of our work revolves around model centric approaches. But it’s high time that we break the paradigm and start concentrating on Data-centric approaches. In this article we will show the other side of the coin: the Data-Centric approach.

Data plays a key role in Data-centric AI but this doesn’t mean that it is not important in model centric approach. In fact, the success of an Artificial Intelligence lies in balancing both.

What are the differences between the Model-centric and the Data-centric approach?

 

Model-centric approach

This approach involves the development of experimental research with the aim of improving the Machine Learning model. The data remains unchanged since you only work on the code or on the model architecture. Finding the right solution is not easy, given the wide range of possibilities to be reduced by trial and error.

 

Data-centric approach

Data-centric approach starts from the assumption that to increase the accuracy of Machine Learning applications it is necessary to start from the data-set. We take into consideration that quantity is not everything! In fact, this approach works on the systematic improvement of obtaining the Quality data.

 

The characteristics of Data-centric approach

First, data centric approach speaks about gathering a quality dataset to train the model. To do that, it is important that the data is oriented towards a goal. Let’s take a trivial example of a model that must recognize fir trees in all forms which include the trees with lights and decoration, the paintings etc.. So, we need to collect only the images of all sorts of fir tree types and train the model. The important thing we need to focus here is that not all the trees which are decorated are fir trees. We need to eliminate the false positives of the model and we can do this by training the model with the quality data.

Second, the data must be labeled correctly. Labeling is the process of assigning one or more labels to the data. For example, in an object detection problem, detecting of dogs or cats require correct annotations of their bounding boxes and also the labels to which class they belong to. If we train a model on a data-set with a significant number of incorrectly labeled images the result will be a less efficient model that fails to perform well on unseen data. Conversely, if we label the images correctly, we will have a more accurate model that will need less data to learn. In Data-centric approach therefore, the focus is on the quality and not on the quantity of data.

We understand that Machine Learning or Deep Learning is something that goes beyond the algorithm. Data is not only the basis from which to start, but the enabling factor of Artificial Intelligence.

MORE TO EXPLORE …

IMAGE & VIDEO RECOGNITION: AI THAT EXTRACTS VALUE FROM VISUAL DATA

There are more and more Artificial Intelligence solutions that allow machines to understand visual data. In particular, we can define that branch of AI that replicates the functions of the human visual apparatus such as Computer Vision. An example is Image & Video…

Read more

Computer Vision

THE 5 BASIC STEPS TO BUILD NEURAL NETWORK MODELS USEFUL FOR COMPUTER VISION

Computer Vision (CV) is one of the most developing branches of Artificial Intelligence today. We have talked about it in various articles, but today we will go into detail on how to build neural network models for recognizing various objects: vehicles, license plates,…

Read more

Share This