Julien and Anna eat cereal for breakfast every morning. While one always buys chocolate cereals, the other likes to vary the pleasures and make mixtures. On the other hand, one always spends time looking for the cheapest reference while the other looks for quality brands first. For a distributor, how to satisfy both Julien and Anna? This is the problem of assortment management. This subject quickly turns out to be a real headache. According to the French Federation of Commerce and Distribution, an hypermarket in France can contain up to 40,000 different products, of which about a third are renewed each year. Yet this approach is essential for retailers: increase volumes, better manage stocks and above all ensure that the right product is in the right place and presented to the right people.
This problem actually depends on many socio-economic factors such as the average income of the customers, the geographical location or the size of the stores. Therefore, statistical models are used to determine the main causes that lead a customer to buy a certain type of product. The problem is that current methods are based on very poorly described product data. The reference catalogs, in the form of Excel tables, actually contain hundreds of lines, often filled with spelling mistakes and abbreviations. "Orenge" instead of "Orange"; "frts" instead of "fruits", the examples are numerous and due to the fact that they are handwritten by human operators. These textual errors degrade the predictions and prevent the statistical models from being fully functional.
As we can see here, the problem comes from the manual nature of the data entry by humans. A solution could be to adopt a tool that would automatically correct these errors, with the result of optimized assortment management.
An assortment is a collection, more precisely it is the number of different items in a product category.
Since the space available in the store is finite, the right combinations of items should be placed in the right place to:
- Stimulate sales
- Encourage customer loyalty
- Ensure efficient stock turnover
- Define the brand's positioning and develop its image (in terms of price: first price, high-end; and choice: diversification or focus).
For a retailer, it is therefore a question of finding a compromise in the diversity of choices offered. Indeed, while it is true that a very varied assortment makes it possible to broaden the target customer base by allowing each consumer to find what they are looking for, there is a risk that they will be lost in the midst of too many references. On the other hand, reducing the size of the assortment increases the readability for the customer and accelerates the buying act since he will have, in fact, less products to compare. Some supermarkets prefer to reduce the number of references but increase the number of categories (putting only one brand of ham on the shelves but offering diced ham and cold cuts next to it). It is therefore a real strategic issue for companies to understand what will influence the perception of the assortments presented to customers.
In fact, the main problem retailers face is that customer behavior is very heterogeneous. For the same individual, choices vary greatly depending on the purchase context and sociological parameters. A person may want to buy cheap cookies because he or she will only consume them sparingly, and conversely choose high-end fruit juice for breakfast. Furthermore, if she will always want to take the same type of juice every day, she may have to buy a wide variety of juices for a party with friends. Finally, another problem linked to the management of product assortments concerns substitutes, i.e. products that can be replaced by others in case of a stock shortage. Will he turn to another brand, another flavor or will he prefer another drink? This subject directly influences the variety of references and the association of products within different assortments, always with the aim of meeting the consumers' expectations as best as possible.
In order to define the right assortments, retailers work by product categories: fruit and vegetables, fish, cosmetics, cleaning products, etc.
The challenge is to determine the buying behavior of consumers according to the layout of products within each category (number of references available, price, product associations, etc.).
In order to understand what favors the triggering of a purchase or, on the contrary, what prevents it, statistical methods are mainly used. In concrete terms, this means analyzing the composition of the baskets of a very large number of customers and in very different contexts. This makes it possible to eliminate cultural, geographic and even socio-economic biases. Therefore, if it turns out that 90% of the customers in a store behave like Anna, it is in the retailer's best interest to favour a very wide range of products and to highlight brands with quality labels.
However, this method has two limitations:
- On the one hand, generally speaking, the product descriptions available to analysts are of very poor quality (to understand why, read this article). They are often littered with spelling mistakes, typographical errors and even untranslated elements. It is not uncommon to find in a product catalog lines with "Frts pure juice 1l orrange" next to "orange juice pure juice 1,0 liter". These textual problems make it very difficult to identify a product in a unique way. The statistical model cannot recognize that the two descriptors represent the same product, which will split their frequency of appearance in the basket analysis. Also, this great heterogeneity does not allow to define precisely the part of the descriptor that corresponds to the brand, the flavor, the volume, etc. It is therefore extremely complex to identify a customer behavior like that of Julian or Anna.
- On the other hand, and more fundamentally, this basket analysis does not work for new products which by definition have no sales history. However, these items are not rare since it is estimated that the renewal rate in stores in France is about one third each year. The solution here is to perform semantic similarity analysis, i.e. to identify several products with similar characteristics and which have a more important sales history. For example, a new organic shampoo will be compared with other shampoos in the same price range but also with other organic products such as washing creams. We can also see here that this approach is very difficult to implement when the product descriptions are not of good quality.
What about product substitution? Defined by the search for products that can be exchanged without any significant impact on sales, the approach is similar. The difference is that we do not think in terms of product categories but rather in terms of need groups. A need group is a set of products that meet the same need and are therefore substitutable. For example, the items banana and strawberry belong to the same category fruit but are not necessarily substitutable, unlike the items butter and margarine which belong to the same need group.
Product substitution is useful in several cases:
- If the main product is out of stock.
- If the substitute is more profitable or is on promotion.
- If the substitute, combined with another product, increases the probability that both will be purchased (e.g. shampoo and conditioner).
In all cases, it is a matter of finding the product or products that have the highest probability of being purchased instead of the substitute. This depends of course on many criteria on the store, justifying the statistical approach explained above. We then find ourselves with the same problems and in particular that concerning the poor quality of product descriptions. However, to choose the right substitute, it is essential to have clearly identified the attributes that compose them. But since the sales history data is manually filled in by various human operators, there is a lot of heterogeneity in this data, which introduces a lot of noise in the textual similarity search.
To solve this problem, we at YZR propose a platform that aims to improve the quality of text data contained in databases. Using a machine learning algorithm, our solution can standardize and label data according to standards and labels chosen by the user. 3 functionalities are thus integrated in our tool:
- Standardisation: this involves replacing several groups of different words (variants) in the descriptions with a single standard. Therefore, "Frts pur jus 1l orrange" "orange juice 1.0 liter" becomes "Fruits Pur jus 1L Orange" on the one hand and "Orange juice Pur Jus 1L" on the other hand.
- Labeling: this involves identifying attributes for each product that correspond to its characteristics. In the example above, the attributes would be "Type" (for juice), "Flavor" (for orange), "Volume" (for 1L) and "Concentration" (for pure juice).
- Fuzzy matching: final step of the process, it consists in matching several different descriptions representing the same product. Here, the algorithm is able to understand that "Fruit Pure Juice 1L Orange" "Orange Juice Pure Juice 1L" means a one liter bottle of orange juice "pure juice".
By standardizing product descriptions, our tool makes basket history analysis and semantic similarity searches robust and reliable. When the process was done by hand, our clients' analyses were only done on a very limited range of products. Thanks to us, they were able to automate this process and perform analyses on their entire category.
Would you like a demonstration of our tool? Do not hesitate to contact us on our website or directly at firstname.lastname@example.org.
Want to better understand why product data quality is a growth driver? Download our white paper available here.