Evaluating Entity Resolution? Don’t Overlook Operational Impacts!

  • What kind to skills are needed and how much time is required for data preparation, mapping and tuning?
  • Do you need to reload all prior data as you add each new data sources? This is important to know as in can mean reprocessing an ever-larger number of records each time.
  • The cost of onboarding new data source, because adding each one might take an expert(s) a month or more. If you expect to add many new data sources, this can really add up fast in terms of total cost.
  • The cost of building an in-house team of specialists to operate the system. Don’t forget to factor in bench strength and new-hire training programs to backfill attrition.
  • The cost to deploy and maintain an A and B infrastructure, if needed (i.e., one system serving 7x24 operations while the other is handling the periodic reload)
  • The cost of any additional hardware needed to support your roadmap (e.g., delivering low latency API services)
  • Adding 32 new data sources containing ~200M more records.
  • Growing to more than 750M records in total.
  • Adding a low-latency entity resolution service via your enterprise data fabric.
  • People and hardware required for data preparation, mapping and configuration of each data source. Note: Don’t be surprised when this cost varies significantly from system to system.
  • Production hardware to include high availability and disaster recovery.
  • Licensing for the entire software stack.
  • Number and types of people needed for daily operations.
  • Requirements to deploy software version upgrades, including regression testing, rollback planning, etc.
  • Security audits, including all the moving parts.
  • Batch-based systems are usually very efficient at quickly loading large files, which is great until day two when you need to reload everything again.
  • Real-time transactional systems are typically slower when loading bulk data, but handle all future adds, changes and deletes incrementally, without reloading, which can have significant long-term operational benefits.

A common mistake, when evaluating entity resolution technology, is to focus too much on the basics, or the minimum viable product (MVP). To reduce your risk of buyer’s remorse, spend more time up front thinking about the overall journey.

--

--

--

Jeff Jonas is founder and CEO of Senzing. Prior to Senzing, Jonas served as IBM Fellow and Chief Scientist of Context Computing.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Basic MongoDB Commands

Tech Talk tomorrow (friday) — What’s new with Apache Camel 3?

R for Beginners — Part 1: Data Structures

How to Create Views

Unity Development — Raycasting

Creating illustration using CSS

CodeRiders is Among Top Software Outsourcing Companies

Much Ado About BlueHost

Bluehost home page

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jeff Jonas

Jeff Jonas

Jeff Jonas is founder and CEO of Senzing. Prior to Senzing, Jonas served as IBM Fellow and Chief Scientist of Context Computing.

More from Medium

Meet the Diverse Data Team at Genomics England

Data Science-ing Policy Making — Automated Metric Collection, Dynamic Control, and Granular…

Open-mindedness and combinations

Evaluating Entity Resolution? How to Avoid Inaccurate Accuracy Testing.