机器学习
面试要准备和技巧,但功夫在诗外。平时注意构建知识体系,论文和实验不断给体系添砖加瓦;面试前熟悉机器学习八股和代码。本章侧重理论部分,系统设计参考3.3 机器学习系统设计
1. 面试要求
熟悉常见模型的原理、代码、如何实际应用、优缺点、常见问题等
归纳偏置(Inductive Bias),数据同分布(IID)
考察范围包括ML breadth, ML depth, ML application, coding
算法背后的数学原理,写出主要数学公式,并能进行白板推导介绍。don’t memorize the formula but demonstrate understanding
一些较新的领域如大模型,会考察论文细节
可能被持续追问为什么? 某个trick为什么能起作用?
每一个算法如何scale,如何将算法map-reduce化
每一个算法的复杂度、参数量、计算量
机器学习代码部分参见 ML coding collections
如果不知道答案,可以承认不知道,补充做什么可以找答案,不要乱编
2. 八股问题实例
模型细节与八股见具体模型页面
Generative vs Discriminative
Discriminative model learns the predictive distribution p(y|x) directly while generative model learns the joint distribution p(x, y) then obtains the predictive distribution based on Bayes' rule
A generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data.
Discriminative models will generally outperform generative models on classification tasks.
The bias-variance tradeoff
track: Cross-Validation
Bias Variance Decomposition: Error = Bias ** 2 + Variance + Irreducible Error
Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data. Unfortunately, it is typically impossible to do both simultaneously.
High-variance learning methods may be able to represent their training set well but are at risk of overfitting to noisy or unrepresentative training data.
In contrast, algorithms with high bias typically produce simpler models that don't tend to overfit but may underfit their training data, failing to capture important regularities.
怎么解决over-fitting
track: underfitting means large training error, large generalization error; overfitting means small training error, large generalization error
数据角度,收集更多训练数据;数据增强(Data augmentation);或Pretrained model
特征角度,Feature selection
模型角度
降低模型复杂度,如神经网络的层数、宽度,树模型的树深度、剪枝(pruning);
模型正则化(Regularization),如正则约束L2,dropout
集成学习方法,bagging
训练角度,Early stop,weight decay
怎么解决under-fitting
特征角度,增加新特征
模型角度,增加模型复杂度,减少正则化
训练角度,训练模型第一步就是要保证能够过拟合,增加epoch
怎么解决样本不平衡问题
评价指标:AP(average_precision_score)
downsampling: faster convergence, save disk space, calibration. 样本多少可继续引申到样本的难易
upweight: every sample contribute the loss equality
long tail classification,只取头部80%的label,其他label mark as others
极端imbalance,99.99% 和0.01%,outlier detection的方法
怎么解决数据缺失的问题
label data较少的情况
数据填充
重要数据可通过额外建模进行预测
怎么解决类别变量中的高基数(high-cardinality)特征
Feature Hashing
Target Encoding
Clustering Encoding
Embedding Encoding
如何选择优化器
MSE, loglikelihood+GD
SGD-training data太大
ADAM-sparse input
怎么解决Gradient Vanishing & Exploding
梯度消失
stacking
激活函数activations, 如ReLU: sigmoid只有靠近0的地方有梯度
LSTM (时间维度梯度)
Highway network
residual network (深度维梯度)
batch normalization
梯度爆炸
gradient clipping
LSTM gate
分布不一致怎么解决
distribution有feature和label的问题。label尽量多收集data,还是balance data的问题
data distribution 改变,就是做auto train, auto deploy. 如果性能drop太多,人工干预重新训练
穿越特征也会造成分布不一致的表象,从避免穿越角度解决
线上线下不一致
model behaviors in production: data/feature distribution drift, feature bug
model generalization: offline metrics alignment
curse of dimensionality
Feature Selection
PCA
embedding
怎么提升模型的latency
小模型或剪枝(pruning)
知识蒸馏
squeeze model to 8bit or 4bit
模型的并行
线性/逻辑回归
xgboost
cnn
RNN
transformer
在深度学习框架中,单个张量的乘法内部会自动并行
冷启动
利用好已有信息(meta data)
选择适合的模型(two tower)
流量调控
Out-of-vocabulary
unknown
3. 手写ML代码实例
手写SGD
手写softmax的backpropagation
convolution layer的output size怎么算? 写出公式
实现dropout,前向和后向
实现focal loss
手写LSTM
给定LSTM结构,计算参数量
NLP:
手写n-gram
手写tokenizer
白板介绍位置编码
手写multi head attention (MHA)
视觉:
手写iou/nms
参考
https://defiant-show-3ca.notion.site/Deep-learning-specialization-b69a42ecb14446f39bd93fd0f15965d5
Last updated