Hadrien Diesbecq The 16 March 2022

Improve sales forecasting with better data granularity

Artificial intelligence models have never been more powerful for making reliable sales forecasting. Yet companies are struggling to fully exploit them. One solution is to address the problem at the source by improving data granularity.

Sales forecasting is a topic of paramount importance for companies. Whether it is to make the right investments or to manage stocks efficiently, it has become a real component of any organization’s strategy. In the distribution sector, its reliability is a determining factor in the negotiation of supply contracts with suppliers, which must be anticipated over several months, while customer orders remain daily.

Nevertheless, this is a particularly complex exercise since it requires accurate modeling of customer behavior and expectations. The most common approach to solving this problem is to feed statistical models with a very large volume of historical sales data. Although companies have invested heavily in the development of machine learning algorithms capable of processing this data, they are now facing a major problem that limits their efficiency: poor data quality.

1) A crucial but difficult problem to solve

For a consumer goods company, having reliable sales forecasting is a real competitive advantage today. In addition to being a sign of good control of the entire supply chain, it has a strong impact on sales performance on several levels. Indeed, it allows :

Optimal inventory management, which we know can have a considerable financial impact.

On the one hand, a product that is out of stock leads to a direct loss of turnover because it is not available for sale at the desired time. On the other hand, it induces additional marketing costs since customers wishing to buy the product following an advertisement will not be able to do so and may choose a substitute or turn to products sold by competitors.
Excess inventory means additional storage costs and poor management of purchasing expenses. In addition, the surplus can be taken over by resellers who may undercut the price and thus compete with the company itself.

Efficient management of the supply chain, which is made up of multiple tools and processes for planning production, calculating supply requirements and forecasting the transport of goods. Here again, good sales forecasting allow a good allocation of costs and above all lead to a gain in negotiating power on contracts with suppliers. The anticipated and precise evaluation of demand allows to set fair and adapted prices.

Better assessment of growth opportunities. Before embarking on developing a new product line or targeting new customers, companies need to assess market dynamics, consumer expectations, reasons for success abroad or with a competitor, etc. To determine these parameters, it is necessary to have a very accurate view of current business activity: what types of customers are buying what brand? Which promotional campaigns are and will be the most effective? etc. Sales forecasting can therefore be a real pivot for the development of a company.

In practice, having reliable sales forecasting remains difficult. It is a real mathematical problem that requires the development of sophisticated statistical models. Moreover, by nature, a sales forecasting is not a prediction of the future. There will always be an uncertainty that needs to be managed.

2) A glaring paradox: sophisticated but largely under-exploited models

Yet there is a paradox. Machine learning algorithms are perfectly suited to solve this type of optimization problem. Based on a large number of variables: location of sales sites, customer profiles, periods of the year, promotional actions, … the goal is to minimize the gap between the anticipated reality (what we think the customer will buy) and the true reality (what the customer actually buys). Using historical sales data as training data, the models are then able to accurately determine how much a new product will sell for. Connected to business intelligence and reporting tools, they then allow the generation of steering indicators (forecast deviations, turnover projections, inventory evaluation) that allow to better anticipate unexpected events (demand peaks, delivery delays, …) and to ensure good governance within the organization.

Nevertheless, according to the American consulting firm Gartner, 90% of B2B sales companies rely on intuitive, highly unreliable criteria instead of advanced analysis models to make sales forecasting.

Why the disconnect? The problem lies in this training data, which is often unreliable and at too low a level of granularity.

Thus, a central point that companies are still struggling to solve today concerns the quality of this product data (see also our article on the subject here). The reason is simple, this data is extremely heterogeneous. Why is that? Because in reality, many human operators are involved throughout the value chain. Not always well equipped or trained in the mastery of IT tools and not communicating with each other, they find themselves entering data manually into spreadsheets according to their own conventions. In concrete terms, how does this work?

– Distributors (medium and large stores) buy from central purchasing offices that group together products from different manufacturers.

– Customers selected by panelists (Ifop, Kantar, Novatest, etc.) buy products in supermarkets and declare their purchases in databases.

– Manufacturers buy the consolidated data from panelists, or in the case of e-commerce, directly from marketplaces (Amazon, CDiscount, La Redoute, etc.).

Improve 01

Sales data therefore goes through many different intermediaries. In the end, the company wishing to perform analyses on its sales finds itself having to handle spreadsheets, often in Excel, which are of poor quality and difficult to process.

– The way they are built can vary greatly depending on the supplier. Products can be classified in different categories, with attributes such as brand, price, packaging presented in a different order.

– They contain textual errors: typos, spelling mistakes, translation errors. For example, “makeup remover” will be written “mkup remover”, “rmover” or “demaquillant”.

– The product taxonomy used is often far from uniform (a Mars or Snickers bar may be designated as a “chocolate bar”, “ice cream bar” or “snack”).

These confusions and this heterogeneity of the product data degrade the quality of the sales forecasting models, and the manufacturers end up with results that only concern vast categories of products that are not always relevant. How then can they properly size the marketing campaign for a new orange matte lipstick if the sales data only concern the “lipstick” category without distinguishing between the different ranges or colors?


3) One solution: process data at a very granular level

For companies, the challenge is to improve the entire preparation phase of the data that will be used to feed the sales forecasting models. This requires better data granularity, i.e. having very precise information on the purchase of each product according to its characteristics (color, packaging, label, etc.), its price, the place where it is sold, the period, the type of clientele and other demographic, geographic or economic factors.

With such data, the results delivered by these models become more accurate. They are able to take into account the impact on sales, product by product, of both global events (national promotions, stock-outs, etc.) and local events (specific promotional campaigns, delivery delays at a distributor, etc.). The result is the ability to adjust supply in relation to demand in real time, better visibility on commercial results, and above all, much greater confidence on the part of all stakeholders (suppliers, investors, customers, etc.).

Furthermore, a mistake would be to believe that barcodes (also called EAN codes) would be a solution by uniquely identifying the product. This is actually not the case for the simple reason that there is no unique reference that assigns a permanent EAN code to a given product. As a result, certain codes are reused and changed regularly, leading to a great deal of confusion (for a better understanding, see our forthcoming article on the subject).

To achieve this, Gartner recommends instead that companies invest in data intelligence solutions that allow them to manage data from various sales sources and along very specific product lines that can be customized by the user. The goal is to avoid as much as possible the phenomenon of human-educated guessing which consists in trying to intuit predictive results without taking into account all the complexity of consumer behaviors and their sometimes counter-intuitive or even irrational nature.


What we offer at YZR is an answer to this problem by allowing business operators to automatically prepare their data at a very fine level of granularity.

Indeed, we develop an automated data preparation tool. Delivered in the form of an interface and an API, our tool is designed to improve the quality of product data. With its standardization, labeling and fuzzy matching features, our solution will provide you with very granular data to feed your sales forecasting models.

YZR is a no-code artificial intelligence platform 100% dedicated to textual data normalization, a crucial phase in data preparation. In the form of a plug&play tool, it is aimed at operational business people (product managers, buyers, etc.) and all those who perfectly understand the business context in which the data is used. Because we are convinced that their skills would be much better used to exploit the data rather than wasting a lot of time preparing it manually.

Our SaaS tool is specially designed to solve your problems related to

– The multiplicity of your data sources

– The absence of naming conventions

– Manual data correction

– Data governance and sharing

It also integrates perfectly with your various tools (Product Information Management, Master Data Management, Data Science – Machine Learning, Business Intelligence), to enable you to achieve, among other things

– A better customer knowledge

Optimized sales forecasting

Accelerated digitization of your offer.

In other words, with YZR, you exploit the full potential of your data.

Want to know more? Would you like a demonstration of our product? Do not hesitate to contact us directly on our website or at

To go further

If you would like to learn more about this topic and understand why data quality is a major growth driver for companies, feel free to download our white paper available here!


– Gartner; Adnan Zijadic, Alastair Woolcock, Tad Travis; Improve Revenue Forecast Accuracy With Emerging Forms of Sales Forecasting Technology; 8 April 2020.
– [in French] Jean-Michel Huet, Julien Dutreuil; La prévision des ventes : un art délicat; in L’Expansion Management Review 2010/3 (N° 138); pages 46 to 53.

Contenus Linked