“Good, fast, cheap. Choose two.”
There are plenty of references to this dynamic, a “law” if you will, applied to various domains e.g., project management, marketing. For so long, entity resolution has lived under this same law too — but no longer!
Team Senzing is elated to report a breakthrough: good, fast, and cheap entity resolution.
BEHOLD: Watch this 17-minute video to witness the deployment of a fully scalable AWS serverless stack from start to finish in 24 minutes of clock time. Then entity resolve 10M synthetic records in ~3 hours for under $100 in AWS compute. …
At Senzing we have created the first real-time AI for entity resolution. We make it quick and easy to accurately combine data about people and companies from different data sources. No other technology that exists today can do this in real time, at scale, with this level of accuracy without any training, tuning or experts. It is also the most affordable option available!
You can try it right now on Paycheck Protection Program (PPP) data in under 20 minutes.
To help make this fast and simple, we have prepared three Senzing-ready .csv files filtered to contain Las Vegas related records.
Most entity resolution algorithms rely on record matching — a method whereby each record is compared to other records for similarity. Record matching does not learn which ultimately results in missed matches.
More advanced entity resolution uses entity-centric learning — a method that treats resolved records as a single holistic entity. Entity-centric learning gets smarter over time, improves accuracy, and can detect non-obvious relationships that humans can easily miss.
Can you match the record on the left to any of the records on the right?
Most entity resolution systems don’t handle ambiguous records properly. This tricky and subtle condition creates false positives that are difficult to find.
In entity resolution, we use the term “ambiguous” to mean “multiple good answers.”
The great American boxer George Foreman named all five of his boys George. Imagine having to perform entity resolution on a record containing only his name, home address, and home phone — nothing else. In a typical household, a record containing a name, home address and phone would likely be unique to a single person. …
The number of organizations still operating hard-coded and expensive homegrown entity resolution is incredible. A new urgency to cut expenses make these homegrown systems a prime target for fast “cost takeout.” [Note: Senzing is downloaded and runs locally. No private data flows to Senzing.]
When deploying Senzing real-time AI for entity resolution, an organization’s return on investment (ROI) is usually measured in months, if not weeks. In addition, they can free up many of their most valuable software engineers and data scientists to work on more critical projects, such as revenue generation. …
As the CEO of Senzing, an up-and-coming software company less than two years out of stealth mode, I find myself pondering the consequences of COVID-19 to our business goals — 2020 and beyond.
I wonder, given our underlying business design, where is the risk? Doing this very simple “back of the napkin” mental exercise I came up with these eight risk factors. Then quantified each using a 0–100 score.
[Note: I encourage folks to improve on this simplistic model, e.g., weighting some items over others, adding new items, etc, etc.]
1. What % of the workforce collocates?
[I’ve attempted to make this subtle entity resolution accuracy issue understandable to the average reader. Not easy.]
The phrase “false positive” basically means you are sure, but nonetheless wrong. Think arresting the wrong person. Whoops!
The term “invisible false positive” is an error that can only be detected with the presence of additional information. No matter how closely an invisible false positive is inspected, the error in undetectable until additional information is considered.
Invisible false positives are discovered when additional information is considered later. You have likely experienced this firsthand. …
It often amazes me what people think is computable given their actual observation space.
Here’s an example conversation:
Me: “Tell me about your company.”
Customer: “We are in the business of moving things through supply chains.”
Me: “What do you want to achieve with analytics?”
Customer: “We want to find bombs in the supply chain.”
Me: “Tell me about your available observation space.”
Customer: “We have information on the shipper and receiver. We also know the owner of the plane, train, truck, car, etc. and the people who operate these vehicles.”
Me: “Nice. What else do you have?”
A big revelation hit me the day a law enforcement investigator explained the yellow stickies plastered around his desktop computer screen.
He said, “Every week, or at least once a month, I search for the people on these stickies.”
These subjects of interest included wanted criminals, missing kids and so on.
With this process, he would periodically search system A for the name, date of birth and other identifying attributes on sticky number one. Then he would rekey the same search information into systems B, then C and so on. …
Simple experiment. Read these two related sentences:
I ducked as the bat flew my way.
What an exciting baseball game.
Many people imagine a winged bat when they read the first sentence. After reading the second sentence, they realize it’s a baseball bat. The interesting aspect of this is the reader did not have to re-read the first sentence. Instead the second sentence automagically reclassifies their understanding of the first sentence in real time!
Ever find yourself talking to someone and thinking you know what they mean? Until a few minutes later when you get additional context and realize they…