The Client

A leading home improvement retailer in the USA.

The Challenge

Our client had a complex classification system with 6000 product types and 4 levels of hierarchy in their taxonomy. The process of categorizing SKUs was largely manual. This was not just a problem for them, but for their suppliers as well, who had to perform this exercise for every single product they onboarded. It was tedious and confusing, and oftentimes, suppliers would end up picking the wrong product type or just throwing everything into a “miscellaneous” bucket. For a while, our client managed by manually correcting the misclassified product data. At one point, their catalog grew to 2 million SKUs, and what was difficult became impossible. 

They turned to dataX for an automated classification and onboarding system.

The Solution

We built an auto-classifier to correct and reclassify all their existing product data into the specified taxonomy. This was up and running within a short period of 8-12 weeks. What about new data? And data from suppliers? We created a pipeline through which all new data, including supplier data, would be processed and classified. Any supplier wishing to onboard their data would simply use this pipeline.

We didn’t stop there. We performed a deep enrichment of the catalog data by extracting attributes from PDF data sheets and providing a dashboard for the content team from our client’s side to make edits, validate and upload enriched data into their PIM.

“Good data is not about just well-classified data. Good data is all about plugging in the right attributes at the right places. About enriching the catalog in every way. dataX does just that.”


So the first part of our solution was to streamline the backend taxonomy and set it up for any kind of scaling and future processing.

The Solution Plus

The future was not far off! Looking at the highly organized nature of the backend taxonomy, our client asked us to map the same to the customer facing displays as well. Essentially, what this meant was that an entirely new display taxonomy would have to be defined and created. Why entirely new? Because the way products are displayed to the customer would be slightly different from the way they are categorized in the database. For example, a customer looking for recyclable products may key in search words like “green” or “environment friendly”, and our system needs to be smart enough to figure out that they are not talking about colors and plants. DataX was more than up to the task of creating this one-to-many mapping.   

“While backend taxonomies are all about business rules, display taxonomies are guided by customer behavior – a nuance that dataX understands well.”


Based on customer search terms and Google AdWords, and taking into account variations within product categories, we created a finely tuned display taxonomy. Using this, we optimized product display pages that included left hand navigation panels, auto-generated product titles, descriptions and breadcrumbs – all of which leveraged robust backend product data to provide enhanced ecommerce shopping experience to the end customer.

The Result

The impact was significant. Our client became the direct and only competitor to the world leader in e-commerce. The quality of data (particularly the well-populated attribute values) directly impacted sales, and great sales led to better market value as reflected in a manifold jump in the stock price. We are not exaggerating when we say that such success is caused by something as innocuous as a customer searching for a red striped shirt, and actually getting that!