How to balance data access and security in fintech testing
In fintech, real customer data provides the most powerful and realistic software testing scenarios. Yet regulations and standards—or a company’s security team—may insist on controls or limited permissions that make that impossible.
The security team is not wrong. The company may be obliged to keep the customer’s social security number, date of birth and full name private. Anyone with personally identifiable information (PII) can use it for identity theft or fraud. When it comes to PII-sensitive data, tests that include valid credit card numbers can facilitate fraud and abuse.
The testers are not wrong either. The best test includes conditions actually seen in production. With live data, the software is more likely to perform consistently across test and production environments.
Fortunately, there are ways to balance security with excellent testing practices. Most of these strategies are intended for transactional systems — such as those used to process insurance claims, monthly invoicing and interest calculations — but they apply to any system that uses PII where there are concerns about the use of production data.
6 core strategies for fintech testing
These six strategies can help software teams balance accurate fintech testing with data security.
Use a golden champion
Most systems can export and import data, at least for backup purposes. The golden master takes that idea one step further, creating a simple example test dataset. This dataset has known cases such as a user with poor credit, a user with great credit, a user under 18 who is legally unable to contract, and others.
With consistently known good data, the team can write static test cases, checking the same users for the same expected results on each run. The easiest option is to store the export in version control or a test data management tool. Note that in some cases the export will have dates in it, such as the dates on an insurance claim, and the program may need to update the dates on import.
Mask identifiable information
To minimize the risk of identity theft, production data masking takes information and changes aspects such as names, birthdays or social security numbers so that they are encrypted but still in valid form. This enables teams to perform realistic, accurate software tests while protecting sensitive data.
This still leaves the problem of accessing the original, pre-transformed data. Some tools exist to automate data masking with security controls, so no tester or programmer has access to “upstream” pre-encrypted data.
Follow the permission principle
When I worked at an insurance company, I wrote a simple code library to determine if any individual had coverage as of the current date. The unit tests used my own personal information. It wasn’t a HIPAA violation because I gave the company permission. When I left the company, someone else took over this strategy and maintained the code.
This approach can work well, but is not ideal when running a large number of tests simultaneously on the same database. As such, permissioned personal IDs can be something to use while waiting for synthetic users.
Test with synthetic users
With this approach, there is a library of code where testers can request a specific type of user—based on age or credit score, for example—and get back a unique user ID. If each test requests a new synthetic user, there will be no “collisions” that occur in the database when the same user is reused for tests. For example, if a test applies to a loan over and over again, a synthetic user can trigger a new scenario where the credit is overextended.
Partner with customer service
Access to some production data is usually essential for customer service. When problems arise in production from customer service, they can provide the information to troubleshoot, isolate and fix problems in production. This is an important aspect of testing. Formalizing this process can sometimes give the team access to some production data.
Use high-volume automated testing
A company I worked with ran through a few thousand users every day, producing a text file that would become an email merge. For each build, they took two golden masters of previous production data, ran the production version of the software and the new build, and compared the production. This moved the number of test cases from a few dozen in a few days to a few thousand in a couple of hours.
In regulated markets, teams may need to combine this automated testing method with data masking or randomization. For example, an analysis of production users by age or credit score could enable the creation of a gold master of synthetic users that match production usage.