Evaluating Entity Resolution? How to Avoid Inaccurate Accuracy Testing.
Imagine conducting vehicle safety tests, like measuring stopping distance or handling in high-speed turns, in a small parking lot set up with some orange cones. Obviously, the results of these type of tests will be quite different than the actual safety you’ll experience in the real world.
Similarly, when insufficient data sets or volumes are used to test entity resolution accuracy, it is highly unlikely the results will have anything in common with the actual accuracy you’ll experience in production. To ensure your accuracy tests will approximate real-world results, avoid these common oversights:
1. Poorly constructed test data or truth sets. There are a few ways accuracy tests produce bogus results, including using synthetic data, the wrong slice of real data, or test data without enough volume or diversity. Furthermore, thinking a truth set is a static source of golden answers is always a problem, as even good truth sets have gaps and errors!
Best Practice: Use real data to create your entity resolution truth set and make sure you use enough data. How much is enough? One way to tell is when, despite your careful curation process, interesting surprises pop up from time to time. It’s important to keep an open mind during testing because certain outcomes may change your point of view, and thus your truth set, or even what to measure. For more details, review the articles on how to create an entity resolution truth set and the path to a successful proof of concept.
2. Evaluating entity-centric matching using a truth set created for record matching technology. This results in inaccurate comparisons and hides the significant accuracy benefits entity-centric matching provides. The blog, entity-centric learning vs. record matching methods provides more details.
Best Practice: When evaluating entity resolution systems that use entity-centric matching, be sure your truth set includes some records that require entity-centric logic to match and your audit tools support entity-centric matching. If not, you’ll need to manually audit exceptions for accuracy, some exceptions will actually be correct.
3. Not considering ambiguous matching conditions, i.e., when records can match to more than one existing entity. If not handled properly, ambiguous records are arbitrarily assigned to the wrong entity, which masks false positives, in some use cases more than 10%. These blogs on ambiguous conditions and invisible false positives provide more information.
Best Practice: Be sure your truth set includes ambiguous records. If your audit isn’t finding any, it might be a sign they aren’t included in the truth set, your entity resolution algorithm doesn’t handle them, or your audit process isn’t detecting them. Don’t shortcut this one, it can be a big deal!
4. Relying on only high-level statistics, such as precision, recall and F1 scores to quantify accuracy. One system may have 50% more matches than another, not because it is more accurate but because it has so many false positives.
Best Practice: When comparing systems, start with a record-level audit that manually inspects discrepancies between systems. Keep a close eye on what each system missed. Sample enough of every category to get a feel for what is really happening. For instance, you may like the additional matches one technology found that were missed by the truth set or another competing method. Remember: One system’s false positives are another’s false negatives.
5. It is almost impossible to produce synthetic data that is representative of real data. Synthetically generated data should be avoided when testing accuracy.
Best Practice: Before resorting to synthetic data, try to find another organization using the same entity resolution technology on data that is similar to yours. Then ask them what level of accuracy they’re seeing. If that isn’t an option and you must use a synthetic truth set, take great care when creating it and follow the advice in the article creating a truth set.
To reduce your risk of buyer’s remorse, use real data when evaluating entity resolution technology. And, run a record-level audit while keeping an eye out for entity-centric matching and ambiguous conditions.
Would love to hear any and all comments about this post, as I’d like to evolve it over time. Also, if you are a like-minded, kindred spirit, please join our Entity Resolution LinkedIn Group.