How to Build a Credit Scoring Grid From a Logistic Regression Model

All code used in this article is available on GitHub. The business logic and modeling functions are located in the src/selection directory, specifically in the following file:
src/modeling/score_computation.py
The corresponding analysis and results are documented in:
09_score_computation.qmd
The images, tables, and charts were generated with the help of the Codex coding assistant.

, your credit score follows you everywhere. It decides if you get a loan, a credit card, or even an apartment. The model behind most of these decisions is FICO. Its logic is simple once you break it down.

FICO weighs five things:

Payment history (35%): pay your bills on time.
Amounts owed (30%): keep your credit use below 20%.
Length of history (15%): the longer, the better.
Credit mix (10%): use different types of credit.
New credit (10%): limit new applications.

If you pay your credit card bills on time, your score rises. Payment history carries the most weight.

These weights produce a score, split into ranges:

300–579: Poor.
580–669: Fair.
670–739: Good.
740–799: Very Good.
800–850: Excellent.

This article follows the same logic, but applies it to our own model.

We use the dataset from this series on building a scoring model. The goal is simple: give each retained variable a weight, compute the score for every client in our data, and show how a new client’s score is calculated.

As before, Codex helped write the code and build the tables and charts. I keep saying this because it matters: you can use AI agents to speed up your work. But check their output. Trust grows only when you verify it. Use these tools, but stay alert.

Let’s recall what we found last time. We kept four variables:

loan_int_rate: the loan’s interest rate.
loan_percent_income: the share of income spent on loan payments.
cb_person_default_on_file: whether the borrower defaulted before.
home_ownership_3: the borrower’s housing status.

Like FICO, we give each variable a weight and build a score from 0 to 1000. A high score means low risk. A low score means high risk of default.

From Model Coefficients to a Score

We turn each coefficient into a score.

Score for each category of a variable

Take loan_int_rate as an example. The score for category $i$ is:

$\bm{ SC(j,i) = 1000 \times \frac{ \left|c(j,i)-\alpha_j\right| }{ \sum_{j=1}^{p}\alpha_j } }$

Here, $c(j,i)$ is the coefficient for category $i$ of variable $j$ . And $\alpha_j$ is the highest coefficient for variable $j$ . For example, for the variable loan_int_rate, the highest coefficient is $\alpha_j = 1.357044926979$ .

This formula gives the score table below.

A client’s score, step by step

Take a new client. We check which category they fall into for each variable:

loan_int_rate is 10%. Score: 181.72.
loan_percent_income is 25%. Score: 0.
No past default (cb_person_default_on_file = N). Score: 59.52.
Owns their home (home_ownership_3 = OWN). Score: 373.94.

We add these scores to get the final score for the client: $\bm{181.72 + 59.52 + 0 + 373.94 = 615.18}$

We repeat this for every client in our data.

How Much Each Variable Matters

Once we have the score, we ask: which variable drives it most?

We measure this on the training data:

Here:

$p_k : \text{ share of clients in category } k \text{ of variable } j;$
The bar over $\mathrm{SC}_j$ represents the average score of variable j, weighted by population;
$m_j : \text{ number of categories in variable } j;$
$n : \text{ number of variables in the model.}$

In plain words, $q_j$ shows how much variable $j$ moves the score. The greater the variation among its different categories, the higher its weight.

The table below shows each variable’s weight.

loan_percent_income weighs the most, at 35%. Then home_ownership_3 at 31%, loan_int_rate at 28%, and cb_person_default_on_file last.

This makes sense. A client who spends more than 20% of their income on loan payments is risky. The fact that this variable drives the score the most is good news: the model picks up the right signal.

Does the Score Separate Risk Well?

Before we build the risk grid, we check if the score does its job: split defaulters from non-defaulters.

We plot the score’s density for each group, split by default, across train, test, and out-of-time data.

The further apart the two curves, the better the score works.

What we see: defaults cluster at low scores. Non-defaults cluster at high scores. This is what we want: high score, low risk.

Building the Risk Grid

Now we build the grid.

Step 1: Default rate by score group

We split the score into 20 equal groups and plot the default rate for each. We start by plotting the default rate against the vingtiles (20 equal-sized segments) of the final score.

This chart is the foundation for the grid: it gives a natural starting point for grouping the 20 segments into six risk classes.

Step 2: Six risk classes

Based on the chart, we group the 20 segments like this:

Groups 1, 2, 3, with scores between 0 and 241: lowest scores, highest risk.
Groups 4, 5, 6, with scores between 241 and 331.
Groups 7, 8, with scores between 332 and 498.
Groups 9, 10, 11, 12, with scores between 498 and 589.
Groups 13, 14, 15, 16, 17, with scores between 589 and 780.
Groups 18, 19, 20, with scores between 781 and 1000: highest scores, lowest risk.

These classes must meet three rules:

✓ Each class must be uniform in risk;
✓ Each class must differ from the next by at least 30%;
✓ Each class must hold at least 1% of all clients.

The table above shows that these rules are being followed.

Step 3: Checking stability

A risk grid only works if it holds up over time. We check two things:

Riskier classes must always show higher default rates, across the full history.
The number of clients in each class must stay steady over time.

Both hold true: risk stays in the right order, and class sizes stay steady.

Conclusion

This article closes our series on building a scoring model. We started with the data and end with a risk grid.

We built a score from 0 to 1000 by scoring each category of each variable. A client’s score is the sum of these category scores. The score splits risk well: defaulters and non-defaulters land in clearly different ranges.

Each variable’s weight: loan_percent_income leads at 35%, then home_ownership_3 at 31%, loan_int_rate at 28%, and cb_person_default_on_file last.

👉 Good to know: the higher your income compared to your loan, the higher your score.

The final risk grid:

0–241: Very High Risk.
241–331: High Risk.
332–498: Medium-High Risk.
499–589: Medium Risk.
590–789: Low Risk.
790–1000: Very Low Risk.

I kept this article short on purpose. We built the grid here using vingtiles and visual grouping, but other statistical methods exist to split scores into homogeneous classes. K-means, hierarchical clustering, and Weight of Evidence (WoE) all offer a more rigorous path to the same goal. That will be the subject of my next article.

References

[1] Lorenzo Beretta and Alessandro Santaniello.
Nearest Neighbor Imputation Algorithms: A Critical Evaluation.
National Library of Medicine, 2016.

[2] Nexialog Consulting.
Traitement des données manquantes dans le milieu bancaire.
Working paper, 2022.

[3] John T. Hancock and Taghi M. Khoshgoftaar.
Survey on Categorical Data for Neural Networks.
Journal of Big Data, 7(28), 2020.

[4] Melissa J. Azur, Elizabeth A. Stuart, Constantine Frangakis, and Philip J. Leaf.
Multiple Imputation by Chained Equations: What Is It and How Does It Work?
International Journal of Methods in Psychiatric Research, 2011.

[5] Majid Sarmad.
Robust Data Analysis for Factorial Experimental Designs: Improved Methods and Software.
Department of Mathematical Sciences, University of Durham, England, 2006.

[6] Daniel J. Stekhoven and Peter Bühlmann.
MissForest—Non-Parametric Missing Value Imputation for Mixed-Type Data.Bioinformatics, 2011.

[7] Supriyanto Wibisono, Anwar, and Amin.
Multivariate Weather Anomaly Detection Using the DBSCAN Clustering Algorithm.
Journal of Physics: Conference Series, 2021.

[8] Laborda, J., & Ryoo, S. (2021). Feature selection in a credit scoring model. Mathematics, 9(7), 746.

Data & Licensing

The dataset used in this article is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This license allows anyone to share and adapt the dataset for any purpose, including commercial use, provided that proper attribution is given to the source.

For more details, see the official license text: CC0: Public Domain.

Disclaimer

Any remaining errors or inaccuracies are the author’s responsibility. Feedback and corrections are welcome.

Source link

How to Build a Credit Scoring Grid From a Logistic Regression Model