MLops
工业部署模型需要掌握的相关知识
data/model version: DVC
feature store: feast
model version: MLFlow
1. ML部署
场景: low latency high qps
strategies
Shadow deployment strategy
A/B testing
Multi Armed Bandit
Blue-green deployment strategy
Canary deployment strategy
应用工具
tf-serving
支持热部署,不会使服务失效
flask
压力测试 jmeter
模型
an end-to-end set
a confidence test set
a performance metric
its range of acceptable values
Recovery
Serving in Batch Mode
量化
高性能
C++重写inference,配上模型加速措施(剪枝,蒸馏,量化),高并发请求
LLM推理
fast-transformer, vllm等框架
attention: flash attention, paged attention
MOE
gpu多实例部署
2. 模型压缩
蒸馏
如何设计合适的学生模型和损失函数
量化
减少每个参数和激活的位数(如32位浮点数转换为8位整数),来压缩模型的大小和加速模型的运算
低秩分解近似
3. retrain
develop a strategy to trigger model invalidations and retrain models when performance degrades. because of data drift, model bias, and explainability divergence
4. 问答
模型部署后,怎么检测模型流量
参考
模型推理服务化框架Triton
https://github.com/rapidsai/cloud-ml-examples
Last updated