Today, the amount of data available to organizations is enormous. With the Covid-19 crisis, the digitalization of processes has become commonplace, contributing largely to this massive expansion. Nevertheless, if collecting data is easy, managing the quality of this data is often much more complex. Wrong format, missing information, textual errors, lost, duplicated or not updated data: there are many situations where data is of poor quality.
Currently, the management of this quality is very often entrusted to IT teams. They try to develop in-house tools or adopt very horizontal data management solutions and try to correct errors, often due to human operators (for more information, our article on the subject here). Similarly, highly qualified profiles, such as data scientists, spend a lot of time correcting the training data in their models, but sometimes find themselves at a loss because they don't understand the context around this data.
That is why, a very promising approach is being adopted more and more: entrusting data quality management directly to business experts.
We explain this change of perspective, which can be very profitable.
Artificial intelligence (AI) models, and especially machine learning (ML) models, are increasingly at the heart of companies' business activities. Their ability to make predictions, to find matches between very heterogeneous data sets or to classify very large volumes of data in increasingly diverse formats (text, images, videos, etc.) makes them perfectly suited for a variety of applications. Automating the drafting and verification of legal contracts, analyzing the causes of customer loss on certain business lines, making sales forecasts (see our article on the subject here), the possibilities are numerous for organizations. Moreover, this transformation is happening very quickly. According to Gartner, 60% of companies will use artificial intelligence tools to automate processes in multiple organizational branches by 2022.
As a result, in recent years, the efficiency of the algorithms developed has greatly increased. The appropriate selection of models, the choice of relevant parameters and the entire training phase are now well established in the practices of data scientists. To such an extent that the enthusiasm generated by artificial intelligence has never been so strong, including within executive committees who see it as a promise to improve their results in a dazzling way.
Nevertheless, many companies are still struggling to take advantage of the full potential of these models. Gartner estimates that more than half of the AI projects in development never reach maturity and are never operationalized.
One of the main reasons for this is actually data itself. Until now, companies have focused on methods to better develop efficient algorithms, but few have really tackled the problems of the data that feeds them. Yet data preparation is an integral part of the job of data scientists. According to Forbes magazine, they spend up to 80% of their time on this task.
However, since these models are deployed at the heart of a company's business activity, processing this data requires a good understanding of the context in which it is used. For example, in the pharmaceutical industry, efficiently preparing a drug data catalog with names such as Rhinofluimucil or Piascledine and detecting possible errors requires real domain knowledge.
The problem is that IT teams are not necessarily trained, which can lead at best to misinterpretation of the results and at worst to errors in the training data causing degradation in model performance.
In addition, more and more organizations are involved in multiple industries, internationally and with different business lines. The multiplication of use cases for increasingly precise and complex processes makes it very difficult for IT teams to manage data preparation centrally.
A new approach to data quality issues is emerging within companies. This is the citizen approach, which consists of entrusting business experts with tasks usually reserved for technical profiles. Among them, there are:
- Citizen developers who use online tools (as a service) by copying and pasting already implemented software blocks.
- Citizen data scientists who analyze the results of AI models using automatic data visualization tools.
- Citizen data engineers who build data pipelines with out-of-the-box integration solutions.
- And other profiles such as citizen data stewards (see our article on the subject here.
The advantages of this approach are multiple:
- These business experts have a real operational view and can therefore understand in depth the data they handle. A product manager will, for example, be in the best position to process data on the sales of the marketplace he manages. The complexity of the models means that interpreting their results in a relevant way is fundamental: one must be able to understand the indicators and propose relevant solutions based on them.
- It generates savings: instead of systematically recruiting highly qualified technical profiles, it consists in investing in tools to assist operational staff already present in the teams and with a real knowledge of the business.
- It allows to multiply use cases. For example, in the banking sector, conversational assistants could be deployed to advise customers, as well as credit risk assessment algorithms and capital optimization systems. But the results produced will have to be systematically verified and interpreted by humans, who must therefore necessarily have expertise. Moreover, it would be catastrophic if these models were trained with poor quality data. To prepare them in the best possible way, business knowledge is also essential here.
Thus, by outsourcing data preparation tasks to business experts, each organizational branch (HR, logistics, sales, executive committee) could appropriate artificial intelligence models for their concrete applications. This is a fundamental step for companies to become data-driven.
Gartner estimates that by 2025, the lack of data scientists will no longer hinder the adoption of machine learning tools. With data better prepared by business experts and automation tools, they will be able to focus more on highly technical tasks.
This transition from IT to business is not easy. In addition to cultural barriers, it is essential to establish real confidence in these citizen profiles. To do this, they must be supported so that they are not stopped at the slightest technical problem.
It is therefore necessary to set up a whole ecosystem around these new profiles, which is organized into three pillars:
- The tools. The lack of technical expertise of citizens must be compensated by the adoption of so-called no-code solutions, i.e. solutions that do not require the ability to code in a computer language. For example, a marketing manager could very quickly visualize the results of different promotional campaign scenarios by playing with parameters such as prices, targeted products or duration. Here the no-code tool would consist of a very easy to use and intuitive interface, coupled with one or more machine learning algorithms that would automatically calculate sales forecasts in real time. However, this assumes that quality data is available upstream and verified by the marketing manager, who is the only one capable of detecting any errors.
- The people. A citizen profile cannot work alone. To be fully effective, it needs to be accompanied by other cross-functional roles. For example, BI developers whose role is to implement visualization tools, ML architects to imagine and deploy analysis models, and of course data scientists to develop all the underlying AI algorithms used to produce analyses.
- The processes. There is a real need for these different roles, IT and business, to get along and understand each other. This requires the implementation of rules and practices that promote collaboration between them: clearly defining each person's function, implementing communication tools, and establishing semantics that clarify the vocabulary used (technical terms such as completeness or NLT will have to be qualified, as well as terms specific to the company's activity). A role of business translator, a real bridge between IT and business teams, is also emerging. Its main function is to promote the use of analytical models within organizations and to advise on the most relevant ones to deploy according to the most important business issues.
The adoption of this organizational form, which favors greater cooperation between technical and business profiles, has a real impact on the operationalization of AI projects. Data preparation and data analysis tasks are entrusted to citizens, accompanied by automation tools, leaving time for IT experts to focus on the implementation and deployment of their models. Each person makes the most of his or her capabilities and brings his or her expertise to the table.
Thus, there is a real cultural change underway within organizations regarding the management of their data quality. Until now, data quality has been handled mainly by the IT department, but it is increasingly becoming a major issue for each company's management. The lack of technical profiles and the need to fully understand what this data means is leading to the emergence of new profiles, known as citizens. However, their success depends on the fact that they work in conjunction with the IT teams and are supported in their tasks by easy-to-use artificial intelligence solutions, resulting in greater efficiency, better agility and therefore greatly improved business performance.
YZR is a no-code artificial intelligence platform 100% dedicated to textual data normalization, which is the most important phase in the preparation of your data. In the form of a plug&play tool, it is aimed at operational business people (product managers, buyers, etc.) and all those who fully understand the business context in which the data is used. Because we are convinced that their skills would be much better used to exploit the data rather than wasting a lot of time preparing it manually.
Our SaaS tool is specially designed to solve your problems related to
- The multiplicity of your data sources
- The absence of naming conventions
- Manual data correction
- Data governance and sharing
It also integrates perfectly with your various tools (Product Information Management, Master Data Management, Data Science - Machine Learning, Business Intelligence), to enable you to achieve, among other things
- A better customer knowledge
- Optimized sales forecasts
- Accelerated digitization of your offer.
In other words, with YZR, you exploit the full potential of your data.
Want to know more? Would you like a demonstration of our product? Do not hesitate to contact us directly on our website or at email@example.com
If you want to know more about this topic and understand why data quality is a major growth driver for companies, please download our white paper available here!
- Forbes; Gil Press; Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says; March 23, 2016
- Gartner; Whit Andrews, Duy Nguyen, Arun Batchu; Unlock AI Functions in Business Applications; 03 April 2020
- Gartner; Peter Krensky; Best Practices to Avoid Citizen Data Science Failure; July 13, 2020
- Gartner; Anirudh Ganeshan, Carlie Idoine; Build a Comprehensive Ecosystem for Citizen Data Scientists to Drive Impactful Analytics ; April 6, 2021
- Gartner; Melissa Davis, Bern Elliot; Applying AI in Business Domains; July 26, 2021