Today, data is everywhere. This is especially true in companies where each organizational branch - executive committee, sales department, marketing department, finance department, human resources department, etc. - has to manage increasingly large amounts of data. - This is especially true in companies where each organizational branch - executive committee, sales department, marketing department, finance department, human resources department, etc. - has to manage ever increasing amounts of data to solve strategic issues such as sales forecasting or churn rate reduction. However, they face a major problem that prevents them from doing so: data heterogeneity. It concerns both the sources (ERP, CRM, applications, open data, ...), the formats (structured, semi-structured, unstructured) and the data itself (they are poorly formatted and do not describe well the objects they represent). Therefore, according to the American consulting firm Gartner, poor data quality would cost companies nearly 13 million dollars (11 million euros) every year.
Solutions have been put in place to address this, such as master data management tools that seek to create a single source of truth for corporate master data. Despite this, the number of data silos has only increased within organizations and the teams in charge of these issues are having a hard time getting rid of them.
But all is not lost and a solution could emerge in the coming years to definitively solve this data sharing problem: data fabrics. According to Gartner, within the next three years, artificial intelligence solutions developed within data fabrics will reduce the operational costs of data management by more than 65%! But there is still confusion about this architecture, which could revolutionize the way data is connected and used within organizations.
In this article, we propose 6 TRUE or FALSE questions to better understand the data fabric design and the huge potential it represents.
A data fabric is an architecture composed of multiple technology bricks aimed at connecting data sources and users as efficiently as possible. Solutions delivering microservices combine to progressively form this architecture. For example, knowledge graphs technologies aiming at connecting data within organizations in a very clear way would be fed by metadata management AI solutions that would automatically collect and update the metadata of the enterprise. It should be noted that these solutions can be available on-premise, on the cloud or in hybrid.
The adoption of the solutions that make up a data fabric (integration, data catalogs, metadata management, data preparation, orchestration, etc.) is done in an incremental way depending on the company's use cases. It is therefore quite possible to start with a technology and then add more as they become necessary. This is the principle of "composability" which facilitates the adoption of tools and accelerates their operationalization, with directly measurable business results. This flexibility is an integral part of the data fabric design.
Data warehouses and data lakes are data management systems that are used primarily for data analysis. Data hubs, on the other hand, are a solution that aims to desilter data as much as possible (to better understand the distinctions between these architectures, read this article). On the other hand, data fabrics are much broader. They facilitate data sharing, can handle any type of data whether it is transactional or operational, allow the integration of application data from customers, suppliers and business partners, etc. In fact, data warehouses, data lakes and data hubs are components of the data fabric design to form an efficient network of data that can be exploited in real time by the enterprise.
One of the key points of data fabrics is that they consist of tools that automate time-consuming and resource-intensive manual data management tasks. Automation solutions based on artificial intelligence, and in particular machine learning and natural language processing (NLP), are therefore essential in data fabrics. Several of them can be combined, such as augmented data catalogs, active metadata management tools, automatic data preparation platforms, etc. The end result of this automation is a strong acceleration of the operationalization of all enterprise projects. Since many of the integration, harmonization and data connection tasks will be completed very quickly, users will be able to focus on the added value of the projects: the understanding of business issues and the resulting decision making.
On the contrary, the design data fabric aims to empower everyone who works on the company's data. This will be made possible thanks to a controlled and granular data governance, i.e. where the access to a file or even a cell in a file will be controlled for each user (thanks to data masking techniques for example). On the other hand, implementing a data fabric requires working according to DataOps principles. This is a collaborative approach (inspired by DevOps, which has proved its worth in software development) that enables a permanent dialogue between the IT teams that set up data infrastructures (data engineers, data stewards, data architects), the operational teams that use them (business analysts, product managers, etc.) and the customers or other stakeholders to whom the results are delivered. Thus, to select the right technologies that will compose the data fabric, it is very important to understand the business use cases to which it will respond and therefore to involve the business experts who will use it on a daily basis. For the latter, no-code tools, which do not require programming skills, are particularly well suited.
While it is true that adopting design data fabrics requires a cultural change, it is also designed to build on the tools already present within companies. Often, the main problem companies face is that they collect and exploit their data in silos using only traditional ETL (Extract - Transform - Load) methods. Data fabrics counter this approach and aim to connect all data within organizations and deliver it efficiently to the right people. It is therefore advisable to start by working with the CDO (Chief Data Officer) or the CIO (Chief Information Officer) to fully understand the concept of data fabric design, then to look for the technological building blocks necessary to achieve this and finally to put in place DataOps practices of collaboration between business and IT teams to operationalize them (see our article on the subject here).
In the data fabric design, the data preparation layer is essential. YZR brings a solution designed for data fabrics with its no-code platform for normalization and labeling of textual data.
To learn more about the normalization process, see our article here.
To get a demonstration of our product, please contact us at website or at email@example.com
- Gartner; Sharat Menon, Ehtisham Zaidi, Mark Beyer; Emerging Technologies: Data Fabric Is the Future of Data Management; December 4, 2020.
- Gartner; Mark Beyer, Ehtisham Zaidi, Donald Feinberg, Henry Cook, Jacob Orup Lund, Rita Sallam, Robert Thanaraj; Top Trends in Data and Analytics for 2021: Data Fabric Is the Foundation; February 16, 2021.
- Gartner; Ehtisham Zaidi; Data and Analytics Essentials: Data Fabric; July 13, 2021.