MaskMask

Why Stripe’s Similarity Clustering Is a Key to Future Application Security

PublishedSep 29, 2021BySubspace Team

The Digital State of Financial Services

FinTech, the technology driving modern financial services, adds value for consumers and businesses by providing the infrastructure necessary to survive and thrive in a digital world. Rather than going to the bank for every deposit, withdrawal, or other change, customers can manage transactions online. Stripe, a company that says its mission is “to increase the GDP of the internet,” makes it possible for their customers to manage their businesses and transactions online.
Stripe’s goal is to maximize payments for businesses and minimize losses due to fraud. But one of the biggest drawbacks of making online financial transactions easier is that it increases opportunities for cybercrime.

Stripe’s Dilemma

Bad actors can use fraudulent accounts in a couple of different ways. Here’s a possible scenario:
  1. The fraudster signs up for a Stripe account, with the intent to use it later to commit fraud.
  2. They use stolen card numbers through their Stripe account, but provide a valid website, account activity, and charge activity.
  3. The goal is for the fraudster to be paid out by the bank before Stripe detects the crime.
  4. The customer requests a chargeback from their bank when they discover the fraudulent activity.
  5. Stripe reimburses the chargebacks to the bank, and therefore to the cardholder.
  6. Stripe attempts to debit the fraudster’s account, but covers those costs if it’s too late.
On a larger scale, fraudsters sometimes attempt to set up predatory or scam businesses. Those scenarios can happen like this:
  1. The fraudster sets up a Stripe account claiming to sell expensive items at cheap prices.
  2. Customers purchase items, and think they are getting a great deal.
  3. The product never arrives.
  4. The fraudster’s goal here is to get paid as much as possible before being caught.
In either of these scenarios, Stripe ends up paying reimbursements to credit card companies to cover the fraudulent charges. The fraudsters are usually eventually caught, but not before some damage is done.

Finding the Offenders

Bad actors can create Stripe accounts by generating fake personal information like names and birthdays while relying on assigned information like bank account numbers. One of the ways to identify potentially fraudulent accounts is by looking at similarities between accounts, but doing this work manually is tedious and time-consuming.
Stripe needs to organically quantify similarities between accounts because some are more significant than others. Additionally, the system that detects the similarities needs to be able to grow and learn as it obtains more data.

Enter >Similarity Clustering

Stripe uses a machine learning model, called supervised learning. In this model the algorithm is trained to detect similarities between accounts. This model uses labeled datasets to help predict or determine if accounts or transactions are fraudulent.
Thanks to the use of labels and the constant addition of new information, the system becomes more accurate over time. Supervised learning is easy to use and allows Stripe to investigate suspicious chargebacks and fraud losses. In a supervised learning model of machine learning, the algorithm is automatically trained as new information is added.
More specifically, Stripe similarity learning, which is a particular type of supervised machine learning. It allows the AI to measure how similar or closely related two objects are. The two objects being compared are given a score. The lower the similarity score, the less similar the objects are, and the higher the score, the more similar they are.
Similarity learning produces the labels used to determine which paired cluster objects—in Stripe’s case the objects are accounts—belong with each other. Stripe calls the process similarly clustering.
Once the datasets are labeled, the next step is the feature generation process, which takes pairs of Stripe accounts and produces a list of defined features. Examples of the types of features that may indicate fraud include account email domains, overlap in card numbers, and measures of text similarity. When the machine learning models are trained, Stripe begins using them to detect and predict fraudulent activity.
Although the basic learning model focuses on comparing one account to another, the information comparisons do not stop there. Taking it a step further, Stripe uses clusters of accounts that share similarities that have been gathered over years of investigating fraud rings. These are used as reference clusters. By sampling the accounts that exist on the edges of the clusters, and comparing them to the edges of clusters of new accounts, the algorithm produces a set of account clusters that fraud analysts review for further evidence of fraudulent activity.

The Rewards of Similarity Clustering

Stripe reaps benefits by using similarity clustering. For example, fraudsters cannot reuse information, and must invest resources to acquire IDs and bank account information that they haven’t used for fraud before. This makes setting up fraudulent accounts a more onerous and a more expensive process. Similarity clustering is the key to identifying fraudulent accounts that share things like email addresses or names.
The model also makes it easier for Stripe analysts to identify accounts that share patterns, or to identify outliers. Analysts can judge multiple accounts within clusters rather than sifting through them one by one. Similarity clustering improves analysts’ efficiency and accuracy.
Stripe can also use the account cluster methodology in other systems, such as in training sets for other models that have little data.
Stripe’s customers see benefits too. Customers don’t have to spend time worrying about fraud, and can maximize their resources by focusing on tasks that matter to them and their businesses. Thanks to similarity clustering, customers can review charges for possible fraud before the charge is processed, which means no more unexpected chargebacks.
Customers can also implement rules and block charges before they go through, which saves time. They can also apply the rules to historical data and look for patterns of fraud.

The Evolution of FinTech in Commerce

Commerce is constantly evolving, and FinTech is playing an ever-increasing role in that evolution. Stripe provides the technology products and services companies need to modernize and manage their payments and business online, including machine learning processes designed to protect Stripe’s customers from fraudulent transactions.
This mindset and focus on security as a core business trait is one that is admirable in the industry. Proactive security systems such as this are not easy to implement, tune, or maintain, and Stripe has gone beyond basic security measures with this implementation.
Subspace has the same mindset; offer protection for your customers' networks, built in to the products. In the case of the Subspace Network, DDos protection is provided latency-free, built in to the network, and prevents DDoS attacks from bringing down your mission-critical real-time applications. The end effect is similar - business keeps running smoothly and safely.
Take a look at the related articles below, and learn more about our always-on DDoS protection HERE

Share this post

Subscribe to our newsletter

The world’s fastest internet for real-time applications—period. Every millisecond counts. Learn more in our newsletter.

Related Articles