面试指南
数据结构与算法
机器学习
系统设计
行为面试
- 领导力法则
- 问答举例
案例分享

Powered by GitBook

On this page

1. requirements
2. ML task & pipeline
3. data collection
4. feature
5. model
6. evaluation
7. deploy & serving
8. Monitor & maintenance
9. 优化与问答
reference

系统设计

机器学习系统设计

情感分析

Previous去重复性/版权检测 Next目标检测

Last updated 21 days ago

1. requirements

constraint

latency: how long it takes a single request
throughput: how many request can be handled in a given amount of time

2. ML task & pipeline

3. data collection

收集data
- GDPR（privacy），数据脱敏，数据加密
分析data。考虑label的distribution
考虑feature是不是只有text的，还是有numeric，nominal的。missing data怎么处理

4. feature

text的feature怎么生成embedding，好处坏处有哪些。（word embedding, fasttext, BERT）
numeric的missing data,如何normalize
实际工作中，都是每个ML组都有自己不同的embedding set。互相使用别人的embedding set。怎么pre-train, fine-train, 怎么combine feature

5. model

模型选择: 传统模型还是神经网络
考虑系统方面的constraint, 如prediction latency, memory. 怎么合理的牺牲模型的性能以换取constraint方面的benefit
模型蒸馏

6. evaluation

train, test, validation split data
evaluation matrix
feature的ABtest怎么做

7. deploy & serving

GPU or CPU
单机多进程 or Spark + Broadcast, KF-serving
dynamic batching
Dynamic Model Input (输入数据的长度)
quantization (cast)
distill/or smaller model
onnx
- 不同的硬件和推理引擎兼容
- 进一步优化: 算子融合、内存优化和硬件加速
caching responses to reduce the request

8. Monitor & maintenance

hardware usage
serving usage: qps
model performance
business object

9. 优化与问答

train/test data和product上distribution不一样怎么办
data distribution 随着时间改变怎么办

reference

细粒度情感分析在到餐场景中的应用

情感分析技术在美团的探索与应用

learn.microsoft.com/en-us/azure/ai-services

Using Sentiment Score to Assess Customer Service Quality

System Design of Extreme Multi-label Query Classification using a Hybrid Model

Query理解在美团搜索中的应用 - DataFunTalk的文章 - 知乎

How to Fine-Tune BERT for Text Classification?

How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs

FastFormers: Highly Efficient Transformer Models for Natural Language Understanding

Understanding Pins through keyword extraction

华为云细粒度文本情感分析及应用