MLops
Last updated
Last updated
MLops负责机器学习模型的自动化: CI/CD/CT,pipeline的orchestration和automation
feature pipelines, training pipelines, and inference pipelines
data/model version: DVC
feature store: feast
model version: MLFlow
feature caching, data sharding, real-time feature aggregation and serving
low latency
high qps
throughput
Shadow deployment strategy
A/B testing
Multi Armed Bandit
Blue-green deployment strategy
Canary deployment strategy
feature processing
batch serving: Apache Hive or Spark
real-time serving: Kafka, Flink, Spark Streaming
Training pipeline
Scheduled Triggering: Apache Airflow, Kubeflow Pipelines
Event-Driven Triggering: AWS Lambda or Azure Functions can be set up to monitor certain metrics and trigger the training pipeline
Inference pipeline
Batch Inference: Airflow or Kubernetes CronJobs
Real-Time Inference: Kafka, Flink, or an HTTP-based API (TensorFlow Serving, TorchServe)
支持热部署,不会使服务失效
TF-Serving 默认使用系统的内存分配器(如 glibc 的 malloc)。通过结合 TCMalloc,可以提升高并发场景下部署性能
onnxruntime
I/O Binding: copy the data onto the GPU
flask / fastapi / sanic
压力测试 jmeter
模型
an end-to-end set
a confidence test set
a performance metric
its range of acceptable values
Recovery
量化
高性能
C++重写inference,配上模型加速措施(剪枝,蒸馏,量化),高并发请求
LLM推理
GEMV 是大模型中的核心操作,其耗时主要源于巨大的计算量、频繁调用和硬件瓶颈
attention: flash attention, paged attention
MOE
vllm
paged attention/ continue batching
fast-transformer
gpu多实例部署
蒸馏
如何设计合适的学生模型和损失函数
量化
减少每个参数和激活的位数(如32位浮点数转换为8位整数),来压缩模型的大小和加速模型的运算
低秩分解近似
剪枝 pruning
develop a strategy to trigger model invalidations and retrain models when performance degrades.
because of data drift, model bias, and explainability divergence
什么时候触发新的训练?
amount of additional data becomes available
model’s performance is degrading
模型性能: 准确性指标,延迟和吞吐性能
数据:drift
系统:资源使用情况
日志
模型部署后,怎么检测模型流量: 日志记录
如何将决策树模型部署在1000台机器上
模型序列化: JSON、Pickle 或 Protobuf
微服务架构
Flask / FastAPI: 轻量级服务
gRPC:高效的远程过程调用框架,适合需要高性能和低延迟的场景
Kubernetes: 大规模管理微服务实例
容器化服务
使用 Kubernetes 进行管理
负载均衡和流量管理
监控和日志管理
Prometheus 和 Grafana 监控微服务的性能指标
Elasticsearch、Fluentd 和 Kibana (EFK, 分别对应日志的索引、日志的采集、日志的展示与分析三个维度)
客户端请求
工具
AWS Terraform: 用户可以用代码定义 AWS 资源(如 EC2 实例、S3 存储桶、RDS 数据库等),并自动化其创建、更新和删除
AWS sagemaker
AWS lambda
MLflow, DVC, Neptune, or Weights & Biases
Machine Learning Engineering for Production (MLOps) Specialization
模型推理服务化框架Triton
https://github.com/logicalclocks/hopsworks-tutorials
https://github.com/iusztinpaul/energy-forecasting
https://github.com/cmunch1/nba-prediction
https://github.com/MatejFrnka/ScalableML-project
https://www.youtube.com/playlist?list=PL3N9eeOlCrP5a6OA473MA4KnOXWnUyV_J
https://fullstackdeeplearning.com/course/2022/
https://github.com/visenger/awesome-mlops