Underwriting with alternative and cash flow data
For consumers and small businesses, access to affordable credit plays a critical role in bridging short-term cash inflows and outflows, as well as supporting long-term financial health.
Traditionally, access to credit depended on credit reporting and scoring systems that most financial institutions use to assess applications. This leaves out consumers who may have the ability to pay but get low scores, and young consumers who haven’t had enough time to build credit. Credit to small businesses is often dependent on the creditworthiness of the business owner rather than the potential of the business, potentially limiting their access to credit.
In the data boom of the last decade, financial institutions now have access to much more information about their consumers. Using this non-agency “alternative” credit data can help better understand a consumer’s risk. Such data can also help reduce operational costs required to collect and verify consumer documents. For example, having access to a consumer’s bank transactions can help verify the validity of a paycheck when the lender can see bi-weekly income deposits.
Lenders may have access to a variety of alternative data about their potential consumers. These are broadly divided into two groups – data from the supplier and lender-generated data. Lenders can generate data about their consumers in several ways. They have visibility into the solicitation source, such as email offers or direct visits. Cookies can help identify where the consumer is coming from, for example, a consumer reads a newspaper article about loan consolidation and clicks on an ad posted by the lender. Lenders may also use information about how a consumer interacts with the online application and how much time they spend on it.
Provider-based data sources can provide much more detailed information about consumers. Information about utility bills and rent payments can be very useful in predicting the creditworthiness of a consumer. Educational history can be an indicator of a consumer’s future ability to repay loans. Providers may also provide access to a consumer’s online activity, which may be an indicator of lifestyle and social choices. Device types and location information using IP addresses can help tailor user experiences, but can also be a signal of creditworthiness. And finally, lenders can access cash flow and bank account information about consumers via suppliers. This cash flow data provides signals not present in traditional agency results, but – more importantly – it is continuous and real-time.
Many fintechs have entered the credit market in the last decade. They operate almost exclusively online, relying on automated underwriting models and algorithms. Many fintechs originate and hold these loans, while several operate as service providers to traditional banks. Lenders often use services from other technology firms that operate in payment, accounting and data transmission networks. All of this information must fit into the lender’s automated underwriting algorithms in order to make an offer to a consumer in real time.
Use of alternative data for underwriting
Alternative data can be used to support various aspects of loan underwriting, by calculating intent, price elasticity, advanced credit scoring and fraud detection. Different data sources can help with one or more of these tasks.
Competitive loan offers can be made if lenders can generate measurements for the consumer’s “intention”, i.e. the likelihood that a consumer will accept a given offer. Furthermore, offers can be adjusted if a consumer price elasticity is known (price elasticity gives us a percentage change in intention given a unit change in the offer).
The purpose can be estimated using several different data sources. Signals in the session that trace the source to the consumer are very useful. For example, when a consumer comes in after specifically searching for “loan consolidation,” they are more likely to accept an offer. Consumers can come from a lender’s blog posts or published content, from their affiliates or after clicking on ads.
Clickstream data can help understand a consumer’s price elasticity. Visiting or applying for a loan from a competitor indicates not only high intentions, but also that the consumer is out “shopping” for a deal. A lender can then provide a better offer or alternative plans. Other variables that can be used to calculate intent/elasticity include educational history, cash flow, income, and other in-session signals generated while the consumer is filling out the application.
Lenders may only add intent/elasticity modules to their automated underwriting algorithms if they observe significant business improvement potential. Lenders should record all of these variables for both original and declined loans. When they have a reasonably large sample, simple statistical testing can show the potential. Measurements for intention and elasticity can be generated using logistic regression-based models. Other methods or advanced algorithms can also be used depending on the use case.
Cash flow data obtained from a consumer’s bank account transactions can be used for two specific purposes in underwriting. It is often used to verify information provided by the consumer in the loan application, but can also be used to “boost” traditional credit scores. Transaction data allows lenders to take a “second look” at a loan application when the standard approval process based on traditional agency-based variables would reject the loan. This enables lenders to extend credit to consumers who would otherwise not be served.
Lenders can access this transaction information with the consumer’s approval. Usually the lender is interested in verifying information the consumer enters in the loan application such as identity, income and employment. The verification task is successful when we observe a series of biweekly or monthly payments that add up to the annual income stated by the consumer. The source of these deposits can verify employment information. Incorporating this into an automated guarantee algorithm is quite simple and straightforward.
Cash flow data can also be used to “boost” your credit score. The bureau’s report credit scores, but these are often static in nature and do not account for all financial activity of a consumer. These activities include rent or utility payments (which are not always reported to the consumer reporting agencies), income and expenses. For example, a recent graduate with no significant credit history will be scored low, but may have a steady income, strong evidence of cash on hand and a good history of positive balance. Such a consumer is likely to behave like one with a much higher credit score. On the other hand, a consumer with a smaller income-expenditure difference may not have the same ability to repay a loan as the credit score would otherwise indicate. Thus, transaction data can be useful in both increasing or decreasing the assessment of a consumer’s ability to repay a loan.
It is not trivial to calculate these score “differentials” and incorporate them into an automated guarantee algorithm. Risk models will need to be developed and empirically tested to measure their predictability. This requires a reasonably large sample of loan applications linked to their detailed cash flow based data and their repayment history. These models can be simple classification models with score bucketing, or more advanced machine learning algorithms when the available data is sufficiently large. Lenders may develop in-house teams to develop these models. Alternatively, they may contract with third-party providers and use their proprietary algorithms.
Data infrastructure to exploit alternative data
With advances in cloud computing, we can now ingest, process, and send triggers back to web-based applications in seconds. An illustration of the data flow is shown above. The data warehouse acts as the central archive for all current and historical information about the applicant, which can then operate online and batch systems.
To better illustrate the workflow, we consider the following use case. A customer with an existing banking relationship wants to apply for a personal loan. She has received a pre-approved offer, and therefore has an invitation code. Her FICO score is in the margin, but she has a direct deposit with us and has steady cash flow.
With the power of cloud computing, we can enhance our traditional warranty models to include other relationships this customer has with us. Furthermore, since she is already a customer, we can initiate an “in-session” chat for any additional information required, so we make a decision while the customer is available. Let’s take a deeper dive into how this will work from a data perspective.
- The customer enters the invitation code on the start page of the application
- This information is sent to our application service in the middleware layer which returns a customer ID associated with the invitation code.
- We then ping pHub (Persons Hub) to pre-fill other information related to the customer
- This gives the customer the opportunity to authenticate that it is really her and that the information provided is correct
- Once authenticated, we now have the following information about the customer (all linked through the customer ID)
- Risk profile
- Her credit profile uses traditional credit data
- Her cash flow using bank information
- Any potential regulatory flags, if applicable
- Offer preferences
- Her primary channel for inbound and outbound communication
- Any friction points she encounters throughout the application flow so we can engage her “in session” or “out of session” as needed
- Risk profile
With the above information, we can now adjust the loan decision in real time with a personal message, thereby increasing customer engagement.
As seen in the flow above, the enhanced warranty package is modular and one can include various other data sources and relationships. This greatly improves the ability to determine margin, and look at data beyond traditional agency data.