How clean, reliable, de-duplicated data enabled rebate automation for a leader in the global hospitality industry

Challenge

Handling over 75-85 million transactions annually from over 3,500 suppliers dealing with more than 550,000 products, the client orchestrates end-to-end procurement experience for 19,000 hotels worldwide. The massive proportions of their operations means that they are in a great position to successfully negotiate vendor contracts for better pricing and secure 15-40% savings for customers. However, this also presents a huge data challenge: there are multiple systems through which the data comes in, all differently organized, resulting in millions of rows of unstructured data that require processing.

Our client engaged rebate specialist, Enable, to step up their automation and bring in a streamlined incentive management program.

Enable quickly recognized that clean, reliable data was the cornerstone of automation success. And the data on hand was unstructured and unsuitable for processing in its current state.

This is where dataX.ai stepped in.

Solution

dataX.ai delivered an automated, scalable, and intelligent solution powered by our pre-trained ML models and LLM techniques, specifically designed for unstructured product data processing. These models are built to handle inputs such as supplier feeds, transactions, and invoices—by extracting attributes, validating them, and normalizing inconsistencies.

Specifically, these were the tasks that we undertook:

1. De-duplication and Product Matching with Precision:

We implemented a robust matching and merging pipeline powered by our pre-built normalization and embedding-based models, achieving over 90% accuracy in identifying same and similar products—defeating the problem of catalog redundancy.
Highlights of the De-duplication & Matching Process:
- Validated & cleaned inconsistent product data to ensure quality and readiness for processing.
- Normalized packaging and sizing variations (e.g., "Box of 12" vs. "Pack of 6", "500ml bottle" vs. "1L bottle") to accurately identify and group SKUs representing the same product sold in different formats or quantities.
- Standardized units of measure (UOMs) to identify products sold in different quantities as the same item.
- Merged similar entries into a single “Parent” product, creating a unified and de-duplicated catalog.
- Cross-referenced products across multiple suppliers to identify and group identical items offered under different supplier names or formats.

At the time of this writing, we have processed over 30 million records, with ongoing efforts to scale further—empowering the client to continuously reduce redundancy, improve data reliability, and drive faster, smarter operations as their catalog evolves.

2. Product Classification at Scale:

With large data volumes, classification is always a challenge. Manual classification of products was simply no match for the real amount of time invoice data flowing in. To solve this, we deployed our pre-trained auto-classification models that delivered with accuracy and efficiency.

We classified over >150K unique products using a combination of our prebuilt Machine Learning (ML) classifiers and Large Language Model (LLM)-based techniques.
Unstructured and inconsistent product descriptions were mapped to the client’s structured five-level taxonomy.
This automated classification significantly improved product discoverability, reporting accuracy, and sourcing efficiency.

‍

Result

The results were immediately tangible, and most impactful for upstream Enable processes, “Previously complex analyses that took days could now be completed almost instantaneously.”

Such is the impact of clean data that:

Tasks that took 5 days are being accomplished in minutes.
Products are accurately classified across 2000 categories.
Volumes are being handled with ease, regardless of the complexity of the data.

‍

Conclusion:

We didn’t just help our client boost operational efficiency through automation — we also forged a powerful partnership with Enable, proving that when good data meets great strategy, the results can set new industry benchmarks.

How clean, reliable, de-duplicated data enabled rebate automation for a leader in the global hospitality industry

Client

Challenge

Solution

Result

More Stories

How our self-learning model doubled the throughput of incoming products in just a few months

How a combination of automation and human expertise helped classify half a million SKUs in record time

Full suite of products for a top industrial distributors buying group in the USA

Email

Phone

Office Address