MLops
Last updated
Last updated
工业部署模型需要掌握的相关知识
data/model version: DVC
feature store: feast
model version: MLFlow
low latency
high qps
throughout
Shadow deployment strategy
A/B testing
Multi Armed Bandit
Blue-green deployment strategy
Canary deployment strategy
tf-serving
支持热部署,不会使服务失效
flask
压力测试 jmeter
模型
an end-to-end set
a confidence test set
a performance metric
its range of acceptable values
Recovery
Serving in Batch Mode
量化
高性能
C++重写inference,配上模型加速措施(剪枝,蒸馏,量化),高并发请求
LLM推理
fast-transformer, vllm等框架
attention: flash attention, paged attention
MOE
gpu多实例部署
蒸馏
如何设计合适的学生模型和损失函数
量化
减少每个参数和激活的位数(如32位浮点数转换为8位整数),来压缩模型的大小和加速模型的运算
低秩分解近似
develop a strategy to trigger model invalidations and retrain models when performance degrades.
because of data drift, model bias, and explainability divergence
什么时候触发新的训练?
amount of additional data becomes available
model’s performance is degrading
模型性能: 准确性指标,延迟和吞吐性能
数据:drift
系统:资源使用情况
日志
模型部署后,怎么检测模型流量: 日志记录
Machine Learning Engineering for Production (MLOps) Specialization
模型推理服务化框架Triton