Price Intelligence GmbH (priceintelligence), Stuttgart, is a provider of market monitoring software - with a focus on optimising e-commerce data for retailers and manufacturers. The solution collects and analyses international data from marketplaces and online shops. Price and competition monitoring, dynamic pricing and product range optimisation are features from which users benefit.
A large amount of data is generated in modern e-commerce systems - especially when department stores fill their large warehouses and products from supplier lists are to be integrated into the in-house product information system. n. Data is processed semi-automatically on the way from the supplier list to the product information management (PIM) system. As part of a potential analysis, SDSC-BW investigated how the process of categorising products into hierarchical taxonomies could be fully automated within PIM systems - based on the description and product titles. priceintelligence provided a complete PIM structure and a large number of supplier lists as a data basis.
The process of multi-label product classification with the aid of current natural language processing (NLP) algorithms requires data sets with a significant amount of information. It is also important to have a balanced distribution across the available categories in order to efficiently train models for prediction as part of the classification process. The challenge for priceintelligence was to develop a fully automated process for product classification that also works with less balanced distributions. In parallel with the priceintelligence development team, the SDSC-BW experts had to pursue various strategies on how data and models could be prepared and processed in the best possible way in order to increase the rate of correct classifications.
As part of the free potential analysis, the SDSC-BW experts analysed the PIM system and the supplier data in detail. The focus of the analysis was on identifying textual information with significant information content and its occurrence and weighting. This enabled the team to draw conclusions about modelling accuracy and make a suitable model selection.
Based on the algorithms analysed in the potential analysis, the team of experts prepared an ideal database with the maximum possible information content and trained three models for use in so-called flat product classification. Thanks to the close collaboration between the experts and the development team and the associated coupling of expertise, priceintelligence was able to implement a successful and solid process for automatic product classification in a timely manner.