Reducing false positives using contextual AI

by James · January 10, 2023

In today’s world of sanctions screening, financial institutions are damned if they do and damned if they don’t comply with the ever-evolving regulations. The consequences of non-compliance are punitive, with the annual fines issued by regulators between 2018 and 2020 alone more than doubling, from $4.5 billion to $10.4 billion. [1]. Full compliance does not come cheap either. Compliance departments at banks have expanded at an alarming rate to cope with the avalanche of alerts coming out of sanctions control systems, resulting in a doubling of operating costs every four years [2].

Scaling up resources is by no means a long-term strategy for the false positive problem. The fact is that more effective controls are required. Most banks still use older rules-based screening technology, and it shows. False positive rates produced by older screening systems can reach up to 95% [3]. As a result, investigators spend most of their time investigating false alarms rather than thoroughly inspecting suspicious alarms.

The rise of artificial intelligence (AI) has been seen as a potential light at the end of the tunnel, showing that it can reduce false positives by up to 70%. Its power lies in its ability to mimic human decision-making when it comes to matching devices together. It does so by learning from previous decisions and exploiting the available context as part of the prediction process. This article explores the challenges of sanctions screening, the deviations of traditional screening systems and how AI is transforming this process using contextual matching.

What makes sanctions control so complex?

Sanction screening is a process where banks check their customers and transactions against lists of sanctioned entities such as individuals, companies and vessels. This is done to avoid doing business with sanctioned parties and to comply with sanctions laws and regulations.

Although the process of comparing information may seem trivial to a human, it is far from automating this with the help of a computer. The complexity of sanctions screening comes down to an amalgamation of factors, which can be broken down as follows:

Data is unstructured

Transactional messages such as payments or trade messages do not always define key details about an entity in a structured way, making it difficult to extract the right information and perform like-for-like comparisons.

The names are varied

Names can be spelled in a number of ways and still refer to the same person. This is even more pronounced when languages such as Arabic or Mandarin are transliterated into Latin-based characters. Abbreviations, initials, aliases and even misspellings add to the challenge.

The information is limited

The amount of data available will vary. In many cases, the name is the only information available. Sometimes addresses such as street name, city and country will also be present.

Large evolving watchlists

Watchlists can contain millions of records and are updated daily. Real-time entity matching on lists largely makes sanction screening a big data problem

The restrictions on screening of older sanctions

Device names tend to be the only consistently available information to shield against. For that reason, the basis for sanction screening is the name-matching component. Legacy screening systems typically use fuzzy and phonetic name matching techniques, supplemented by dictionaries, as part of a broader rule-based algorithm.

Fuzzy matching is an algorithm designed to measure the similarity between two names based on the number of character changes required for one name to match the other. For example, John and Jon have an edit distance of 1, since the insertion of an ‘h’ is required. The more edits required, the lower the fuzzy match score. This approach is used to handle spelling inconsistencies. On its own, fuzzy matching can easily fall short. Take the names ‘Abdul Rasheed’ and ‘Abd Arrachide’, same name but spelled very differently. More sophisticated approaches will supplement fuzzy matching with phonetic matching, which explains the similarity in how words sound, resulting in more robust matching.

Fuzzy and phonetic matching are effective in text matching, but not necessarily entity matching, so they are orchestrated as part of a larger rule-based approach. These rules are the product of manually hand-crafted checks weighted according to their importance against the final similarity score. An advanced rule-based screening algorithm may have more than 30 rules that must be carefully coordinated and weighted to produce meaningful scores. A major limitation of this approach is the rigidity and fragility of the scoring system. Removing or adding a rule requires rethinking the entire weighting system. Similarly, changing the weighting of one rule can have major consequences for the final result.

At a more fundamental level, the underlying approach of legacy systems is incongruent with the way investigators perform device matching. Investigators will gather all available information such as names, addresses and past behavior for context before making any decisions. More generally, people also unconsciously infer additional details even with the limited information available. To humans, for example, the name Leonardo Mancini is not just a collection of letters. Most people can automatically claim that it belongs to a person who is most likely a man who comes from Italy. The ability to extract data from data is an invaluable asset, one that can transform the way device matching is done.

Contextual matching using SafeWatch screening’s AI

False positives are a byproduct of rigid, imprecise algorithms that lack the full context of available data. This is why Eastnets has built its own AI-based sanctions screening engine with investigators in mind. SWS AI builds on the strengths of proven methods such as fuzzy and phonetic matching, and extends this by utilizing machine learning, word embedding and knowledge bases to enrich existing data with more context.

There are three core ingredients that ensure SWS’s AI achieves laser-level precision with full transparency.

1. Context extraction

As mentioned earlier, names are not just a set of characters. Just like humans, AI can be used as a tool to extract even more information from the available names, for example:

Sex: Jenny and Jonny have similar blurring and phonetic points, but could potentially indicate two different people based on their genders.
Origin of the name: Some names are associated with different regions and countries like Mathew and Matthieu. One is English and the other is French, another indication of potentially referring to different people
Device type: AI combined with knowledge bases can help identify whether a name belongs to a person or a company. Reduce the chances of matching a company with a person.
Company semantic similarity: ChemTech Drugs and ChemTech Pharmaceutical are more likely to be similar than ChemTech Media Company. AI can infer semantic similarities between company types.

The goal here is information gathering and context building to construct a more complete picture when performing entity matching. In addition to this, additional details may also be included as part of the overall context where possible. This may include the address, IBAN, the device’s previous detections and unwanted media.

2. Automatic contextual matching using historical decisions

Putting together rules for sanction control can become cumbersome, especially as the rule set becomes more complex. This is where AI has a significant advantage. Rules are implicitly defined in the model itself, and the weights of these rules are automatically calibrated by learning from historical data. Using past decisions, the model will adjust the importance of certain features and context based on how investigators have made decisions in the past. This tailors the model to the investigators’ decisions.

3. Clarity

The transition from rule-based to AI-based approaches has typically been met with some hesitation due to the black box of machine learning models. Ironically, many older screening solutions can also be difficult to decipher. Ad hoc improvements and updates to handle edge cases culminate in overly complex rule structures that are far from interpretable. A key part of device matching and resolution is understanding which factors contributed to the matching, and therefore SWS’s AI matching solution provides a full explanation. Investigators can see exactly how and why the AI classified the result as a match.

The bottom line is that the true cost of compliance using legacy sanctions screening technology is quickly becoming unfeasible due to the overwhelming number of alerts it generates and the resources required to handle them. AI’s ability to extract hidden context from names and combine it with proven and reliable algorithms such as fuzzy matching allows it to screen devices in a contextual way, just like humans. And with its self-learning capacity, AI models can be trained to make decisions just like an investigator would, dramatically reducing false positives and allowing investigators to focus on the cases that matter most.

About the author

Daoud Abdel Hadi
Lead Data Scientist – Eastnets

Reducing false positives using contextual AI