信贷风控

Design an end-to-end machine learning system for a real-time loan approval/rejection model, such as credit cards. Discuss the infrastructure, features, model, training and evaluation aspects of the system.

1. requirements

functional

  • stage: 获客、贷前(Loan origination)、贷中(Loan maintenance /servicing)、贷后(Delinquency management/recovery)

  • types of loans it will support

  • types of risk it will support -> 决定了是一个什么样的机器学习任务, 影响难点

  • 目标: 识别优劣(风险,需求,价值)

    • 业务指标

non-functional

  • compliance requirements

  • scalability goals

  • reliability, security

2. ML task & pipeline & keys

信贷风控决策流 黑白名单 rule + model binary classification, multi class classification or multi label classification

3. data

  • user (关系型数据库)

    • credit 信用

    • fraud 欺诈

  • log (分布式文件系统)

  • label

    • 滚动率(Roll Rate)、vintage

    • 从数据指标中发现风险点

    • 信贷: 逾期不放贷款的就是黑样本

需求类: 履约风险: 履约能力:

外部数据

  • 因合规要求,很多互联网大数据只能从征信平台以评分的形式引入

4. feature

feature engineering is the art

user

  • ID/Address Proof: Voter ID, Aadhaar, PAN Card

  • Employment Information, including salary slips

  • Credit Score

  • Bank Statements and Previous Loan Statements

  • 反欺诈重要feature,如ip,device_id,idfv,phone number


  • graph

  • 行为序列

  • compliance -> buy some user feature

稳定性

  • SHAP 是一种解释模型输出的方法,能展示某个变量对预测值的贡献。检查:变量的值是否与模型输出呈合理的趋势;是否存在“非预期拐点”或“跳跃”(可能是过拟合的信号)。

  • 某些变量和风险标签的关系应该是单调的(如负债率↑,违约概率↑),否则可能存在逻辑问题、数据问题,或模型过拟合。建模阶段通过分析变量与 label 的WOE 图(Weight of Evidence)或 KS 图判断是否具有单调关系

5.model

可解释性和稳定性(stability)。模型不仅要在当前样本上表现好,还要对未来数据、不同时间段、不同人群具有鲁棒性,也就是稳定性。

  • Credit Scoring Models 评分卡

  • LR

  • GBDT

  • NN

  • Probabilistic Calibration

6. evaluation

  • offline

    • 准确率、AUC、Log Loss、Precision、Recall

    • Kolmogorov-Smirnov,风控常用指标

7. deploy & serving

  • feature service

  • prediction service

8. monitoring & maintenance

  • Approval Rate

9. QA & optimization

  • cold start

  • profit/revenue come from which (credit score) part of customers, risk come from which part of customers

reference

Last updated