Privacy by Design (PbD) and Senzing
Inspired by the 70th Anniversary of Universal Declaration of Human Rights, I wanted to blog about the PbD history and features of Senzing.
For over a decade we have been building Privacy by Design into the Senzing technology and we’re starting to see great examples of it in the wild, such as its use to modernize voter registration in America. This implementation is effective in no small part due to the utilization of one of our PbD features called Selective Field Hashing. More about this system, and this feature, in this IAPP keynote video “21st Century Voting — The Success of the Electronic Registration Information Center” and this story in The New York Times “Another Use for A.I.: Finding Millions of Unregistered Voters.”
BACKGROUND
2005: IBM acquired my Las Vegas-based startup Systems Research & Development for its real-time Entity Resolution technology known as Non-Obvious Relationship Awareness (NORA). IBM renamed NORA and now sells this technology under the brand name IBM InfoSphere Identity Insight. It’s a unique product, and one in use around the world — a product my team and I are very proud of.
2008: While at IBM, the team and I quietly embarked on an ambitious project (code named “G2”) to revolutionize Entity Resolution. Among our many aspirations was a goal to create a real-time, self-learning, self-correcting Entity Resolution system while baking in as many privacy and civil liberties features as we could.
2011: Following two and a half years in full stealth mode, we announced the existence of the G2 technology on Data Privacy Day, January, 2011.
2012: Ann Cavoukian, the creator of Privacy by Design (PbD), and I released a joint paper on June 8, 2012 entitled “Privacy by Design in the Era of Big Data.” In this paper, Ann described her aspirations for Privacy by Design (PbD) and I enumerated the PbD features we imagined for our next generation Entity Resolution technology.
2016: Following a one-of-a-kind IBM spinout, the G2 technology and the G2 team form Senzing. Senzing remains in stealth, over the next two years, quietly focusing its engineering efforts on ease of use, accuracy and performance.
2018: Senzing, Inc. formally launches its Entity Resolution technology of the same name.
MORE ABOUT PbD IN SENZING
In the paper I co-authored with Ann Cavoukian, we discussed a number of privacy and civil liberties features we felt ought to be baked in to any Entity Resolution technology (or, at a minimum, provide the user the capability of turning them on.) These features include:
FULL ATTRIBUTION: Every record received is stored with a pointer to its source system and record id. There are no processes, e.g., merge/purge, data survivorship, where some data is discarded. If data is discarded, system-to-system reconciliation audits become problematic, and it is difficult or impossible to correct historical decisions. Another good reason to maintain Full Attribution is found in the Universal Declaration of Human Rights where four articles admonish arbitrariness e.g., in Article 9, “No one shall be subjected to arbitrary arrest, detention or exile.” Full Attribution is baked into Senzing because: if you don’t know where the data came from, how can any resulting action be anything but arbitrary?
FIELD HASHING: The ability to perform Entity Resolution on hashed data — data cryptographically altered to be unreadable and hard to reverse. Hashed data helps reduce the risk of unintended disclosure. Senzing has baked in the ability to perform Entity Resolution over hashed fields — while still maintaining some fuzzy matching qualities e.g., Bob versus Robert and dates of birth with transposed month and days.
DATA TETHERING: Adds, changes and deletes occurring in systems of record must be accounted for in as close to real time as possible. Data currentness is important, especially if one is making significant, difficult to reverse decisions that affect people’s freedoms or privileges. For example, if someone is removed from a watch list, how long should they have to wait before their name is cleared in downstream systems? Senzing supports adds, changes and deletions in real-time. Among other things, this enables compliance with the Right to Be Forgotten obligations that are part of privacy regulations such as the E.U.’s General Data Protection Regulation (GDPR).
FALSE NEGATIVE FAVORING: In many use cases, when it comes to Entity Resolution, it is far preferable from a civil liberties standpoint to miss a few things (false negatives) than inadvertently make claims that are not true (false positives). This is because false positives can adversely affect people’s lives e.g., police knock down the wrong door or an innocent passenger is denied the ability to board a plane. The algorithms in Senzing, favor false negatives, by design, but they can be adjusted, as appropriate, e.g., for marketing use cases or human in the loop investigations.
SELF-CORRECTING FALSE POSITIVES: Imagine making an assertion that two people are the same because they share exactly the same name, address and home phone number — only later to learn that these are really two different people (a junior and a senior). Senzing, by design, self-corrects these rare cases, in real-time.
INFORMATION TRANSFER ACCOUNTING: Record-level information transfers should be recorded at the originating system. This allows stakeholders (consumers, data custodians, oversight bodies, etc.) to determine exactly how data is flowing. If source systems don’t track which records were sent where, they can’t ensure future changes and deletes are properly relayed downstream (aka Data Tethering). The Inquiries section on US credit reports, as mandated by the Fair Credit Reporting Act (FCRA), is a good example of this in practice. The Inquiries section allows consumers to review how their credit file has been shared. This PbD concept is best deployed during system integration, so it is not built into Senzing.
TAMPER-RESISTANT AUDIT LOGS: Tamper-resistant logs make it possible to audit user behavior with confidence — even a database administrator can’t alter the evidence contained in the audit log. This capability is particularly important for addressing search abuse e.g., privileged users looking up records without a legitimate business purpose which would include for example an employee taking a peek into their roommate’s file. This PbD concept is not built into Senzing as it is best deployed today via a widely available immutable logging mechanism such as Blockchain.
As I write this, I believe proudly that Senzing may have more features baked in to enhance privacy and civil liberties than any other commercially available Entity Resolution software. I could be wrong. If so, do tell!