Stream Shadowing and IA enabled Testing

Greg Bobrowski

Published Dec 27, 2022

Stream Shadowing and IA enabled Testing -- PII and PHI free; GDPR/HIPAA/PCI DSS/CCPA compliant

As cloud-based technologies solidify their presence in modern IT architectures the Mesh based supersystems grow in number of nodes.

With the above comes a challenge for those who participate in digital transformation: verification in absence of an unconstrained access.

What I mean is that when upgrading a sub-mesh or a single note the integrators find themselves in a situation where multiple domains that provide inbound data streams are either not scheduled for upgrade any time soon, or are uninterested in cooperation for whatever reason, or are unreachable due to the nature of the way they operate, or are constrained by regulatory obligations from being more cooperative than they already are.

In short, you cannot tell them what data you want to receive.

In the traditional testing the tester prepared a set of sets of a very specific values and expected to see the same values, or derivatives of them in the system under test. Those are called Expected Values.

The Integrators are expected to be working with two layers within which they need to verify processing: the first within the system that they construct; and the second within the inbound data processing queues or the inbound data processing streams that they are likely to establish or modify.

Now let us see an example Expected Results.

Let’s start with the simplest: a postal address. Does the tester really need to use the predefined set of data? Certainly, the above classical approach minimizes the volume of test data because the data needs to cover only a handful of scenarios. However, given availability of the online Postal Address validation services the tester can, as well, let the source feed 100 addresses and let the online service confirm their validity. From the results of the above sample of 100 the Tester can build his confidence that the connection and mapping of fields is correct.

Another example would be a dollar value paid to purchase the insurance policy. While these might be different for every purchase the statistical properties of a sample of the first 1000 transactions must match.

Furthermore, the tester needs to avoid looking at values in insolation, meaning he needs to know how many numeric values are in a payload, or a record. If there is only one an erroneous swapping of fields is not a risk.

The above are just a commonsense example for manual testing.

There is an entire science build over the years that targets assessing data quality. This science exploded recently in numerous AI models and solutions that allow to automate the confidence building.

This article attempts to identify yet another aspect of testing: accelerating the delivery day

The idea is to start testing with in-production data as soon as possible, and amply before going to production.

Recommended by LinkedIn

From Bespoke Projects to Vertical AI Agents: The Shift…

Vikas Anand 7 months ago

AI Isn’t a Strategy — It’s an Enabler: What the Board…

Yogesh K. Kumar 10 months ago

AI Governance Has a File Format Problem

Boris Jamet-Fournier 3 weeks ago

The must-have prerequisite is a PRE-PROD Test Environment to which a PII and PHI free shadow of production data is forked.

Problem to solve:

Both business and regulatory requirements prohibit usage of a long list of well-defined data items of the real-world data (production data). The certifiable, reliable methodology of identifying and encrypting or removing protected information must be provided.

Proposed solution:

While systems like Kafka, Kinesis or alike provide means to implement data masking and encrypting the solutions that use these abilities, at this time, would be ‘homemade’ therefore requiring undergoing a lengthy process of approval by concerned security departments. There is a need for a generic portable, certified, software product (UCDB on a picture) that could be injected either after the queue or before the queue and provide the ability to fork a portion of the traffic into a module that removes PII, PHI and other non-exposable data items before forwarding the secure data to the Test Environment. Let us call it pre-prod environment.

Given that data domains are narrowly specialized and that domain specific data confidentiality requirements result in a considerably different sets of data items to protect before someone attempts to manufacture a UCDB product the System Integrators community might greatly benefit from the availability of masked data streams provided by the Data Providers themselves (SPCDB on a picture).

The author is aware of the existence of Test Data Sets provided by CMS and by numerous other institutions. These, however, tend to be very static, if not stagnant, and eventually missing adequate coverage in a sub-domain the Integrator needs.

Proposed secure forking of PII and PHI free, GDPR/HIPAA/PCI DSS/CCPA compliant data stream both would allow better test coverage and enable earlier delivery by allowing earlier testing with pseudo-production data.

Proposed here secure forking, if implemented at the Data Provider/Source level would additionally provide tighter security by removing sensitive information at source, as opposed to, later within integrator-controlled network.

Your feedback and criticism is highly appreciated.

Greg

Stream Shadowing and IA enabled Testing

Greg Bobrowski