- Hard to sustain level of classification that drives ongoing business value
- Not using the knowledge within the organization
- Long delays in classification, leading to imprecise metrics for category managers
- Loss of trust in data and classification process
Let’s visit each of these in detail.
Hard to sustain classification that drives ongoing business value
For most large organizations, classifying the billions of dollars of spend they have into a relatively fine-grained taxonomy (3+ levels) is a hard problem. Let’s say you had one million spend line items per year. If you assigned one person this task, and they worked non-stop at the rate of one minute per transaction, it would take them nearly two years to classify the entire spend!
The complexity is not just in volume. The complexity is also in the fact that the data variety is immense. Part/item descriptions for the same category might vary significantly, words will be abbreviated, supplier names misspelled.
But this granularity is required to achieve the necessary business actions that generate value. For example, for category managers to understand if they are getting consist component prices from a supplier, they need to see prices at a granular level over time.
Often organizations solve this problem one time with a brute force method — applying thousands of rules and lots of human judgement to classify spend data into the taxonomy. The problem is that this is not sustainable; there is no way that the same effort will be repeated next quarter, and by then several new purchases with different descriptions would have made this data out-of-date.
In addition, once the initial savings are achieved — typically by picking the low hanging fruit (e.g., supplier rationalization for the largest categories) — the effort required to maintain the data and achieve additional business value is not justified by the returns achieved. Thus organizations are not able to do activities such as:
- Drive cost savings from the long tail of suppliers
- Monitor prices that they get for a category over time
- Ensure consistent payment terms across time across all transactions and categories for a supplier
At Tamr, we solve this problem by using machine learning in addition to human (expert) input. Experts are asked to classify some transactions, which helps bootstrap the system; they teach Tamr what patterns are associated with different categories. We then do a fuzzy match between these patterns and new transactions to determine the right category. This makes the classification effort over time lower and makes this entire process sustainable. Thus companies are able to achieve and sustain a high level of accuracy without a lot of manual effort.
Not using the knowledge within the organization
Most spend classification projects tend to push the entire onus of classification on the vendor and the sponsor of the project. This works well for several categories, especially on the indirect side. For example, blue pens are clearly part of “stationery.” But what about the direct side? The moment you get to specialized parts where the knowledge is limited to a few people, you have to get the right expert associated with the right spend items. Very often category teams are the ones with the right information. Shouldn’t they be involved in making this decision? Also, without involving category teams, there is no buy in from them and feedback is limited. This results in poor usage of the classified data.
Tamr was built on the principle that no one person knows everything about the data; data expertise is spread across the enterprise. Hence we reach out to the appropriate experts with the right questions at the right time, so that the expertise can be used. Further, we track this input — who gave it, when, and any reason why they said so. This helps us to do root cause analysis when any mis-categorizations are detected.
Classifications are out-of-date
Here is a simple question for you: When was your classification data last refreshed? If you are like most companies, the answer might be a month to a quarter ago. This might be acceptable for historical analysis, but your commodity managers are making decisions based on the data they have available right now. Any supplier interactions that occur without everyone having detailed data on spend and suppliers is sub-optimal.
Why does this data get so out-of-date? Most companies that offer spend visibility solutions have to apply to lot of manual effort to classify this data, so they cannot afford to do this more than once a quarter without charging significant amounts for it. Tamr reduces the ongoing effort dramatically, and reaches out to appropriate experts when it cannot make decisions with high confidence. Which means that classified data is timely and accurate, and that people will trust and use the data more.
Loss of trust in data and classification process
No solution will give you 100% classified data, 100% of the time. The key thing is to build in appropriate checks and balances so most errors can be caught and corrected immediately, and a mechanism to fix those errors. Only doing this consistently will maintain trust in the data and in the classification process, and enable the data to be used consistently for ongoing decisions.
Tamr does two things to maintain trust in the process:
- Tamr asks questions of experts when it is not confident.
- Tamr exposes the categorizations right to the consumers e.g., commodity managers and buyers. The views presented to the consumers of the data enable them to judge any major errors immediately (e.g., Microsoft supplying ball bearings) and provide feedback to data curators.
When you are considering starting a new spend classification project, make sure you raise these concerns with your team and with your partners. And if you are struggling with current spend visibility, do reach out to us.