机器学习
Last updated
Last updated
面试要有准备和技巧,但功夫在诗外。注意平时构建知识体系,读论文和做实验不断给体系添砖加瓦;面试前巩固机器学习理论和
本章侧重理论,系统设计参考
熟悉常见机器学习模型的原理、代码、如何实际应用、优缺点、常见问题
归纳偏置(Inductive Bias),数据同分布(IID)
考察范围如ML breadth, ML depth, ML application, coding。如果不知道答案不要乱编,承认不知道,并补充相关理解、做什么可以找到答案
理解算法背后的原理,主要数学公式,并进行白板推导介绍 (don’t memorize the formula but demonstrate understanding)
可能被持续追问为什么? 某个trick为什么能起作用?
每一个算法的复杂度、参数量、计算量
每一个算法如何scale,如何将算法map-reduce化
较新的领域如,会考察最新论文细节
机器学习代码部分见
模型细节与具体问题见模型子页面。以下实例回答注意如何安框架分条陈述
Generative vs Discriminative
Discriminative model learns the predictive distribution p(y|x) directly.
Generative model learns the joint distribution p(x, y) then obtains the predictive distribution based on Bayes' rule.
A generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data.
Discriminative models will generally outperform generative models on classification tasks.
The bias-variance tradeoff
how to track the tradeoff: Cross-Validation
Bias Variance Decomposition: Error = Bias ** 2 + Variance + Irreducible Error
Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data. Unfortunately, it is typically impossible to do both simultaneously.
High-variance learning methods may be able to represent their training set well but are at risk of overfitting to noisy or unrepresentative training data.
In contrast, algorithms with high bias typically produce simpler models that don't tend to overfit but may underfit their training data, failing to capture important regularities.
怎么解决over-fitting
how to track: underfitting means large training error, large generalization error; overfitting means small training error, large generalization error
数据角度: 收集更多训练数据;数据增强(Data augmentation);或Pretrained model
特征角度: Feature selection
模型角度
降低模型复杂度,如神经网络的层数、宽度,树模型的树深度、剪枝(pruning);
模型正则化(Regularization),如正则约束L2,dropout
集成学习方法,bagging
训练角度: Early stop,weight decay
怎么解决under-fitting
特征角度: 增加新特征
模型角度: 增加模型复杂度,减少正则化
训练角度: 训练模型第一步就是要保证能够过拟合,增加epoch
怎么解决样本不平衡问题
评价指标:不要用准确率
down-sampling: faster convergence, save disk space, calibration. 样本多少可继续引申到样本的难易
up-weight: every sample contribute the loss equality
long tail classification,只取头部80%的label,其他label mark as others
极端imbalance,99.99% 和0.01%,outlier detection的方法
怎么解决数据缺失的问题
label data较少的情况: semi-supervised, few-shot
特征列缺失:
数据填充: mean, median, nan
重要特征可通过额外建模进行预测
怎么解决类别变量中的高基数(high-cardinality)特征
Feature Hashing
Target Encoding
Clustering Encoding
Embedding Encoding
如何选择优化器
MSE, loglikelihood+GD
SGD-training data太大
ADAM-sparse input
怎么解决Gradient Vanishing & Exploding
梯度消失
stacking
激活函数activations, 如ReLU: sigmoid只有靠近0的地方有梯度
LSTM (时间维度梯度)
Highway network
residual network (深度维梯度)
batch normalization
梯度爆炸
gradient clipping
LSTM gate
怎么解决分布不一致
distribution有feature和label的问题。label尽量多收集data,还是balance data的问题
data distribution 改变,就是做auto train, auto deploy. 如果性能drop太多,人工干预重新训练
穿越特征也会造成分布不一致的表象,从避免穿越角度解决
怎么解决线上线下不一致
model behaviors in production: data/feature distribution drift, feature bug
model generalization: offline metrics alignment
curse of dimensionality
Feature Selection
PCA
embedding
怎么提升模型latency
小模型或剪枝(pruning)
知识蒸馏
squeeze model to 8bit or 4bit
模型的并行
线性/逻辑回归
xgboost
cnn
RNN
transformer
在深度学习框架中,单个张量的乘法内部会自动并行
冷启动
充分利用已有信息 (meta data)
选择适合的模型 (two tower)
流量调控
Out-of-vocabulary
unknown
手写SGD
手写softmax的backpropagation
convolution layer的output size怎么算? 写出公式
实现dropout,前向和后向
实现focal loss
手写LSTM
给定LSTM结构,计算参数量
NLP:
手写n-gram
手写tokenizer
白板介绍位置编码
手写multi head attention (MHA)
视觉:
手写iou/nms
https://defiant-show-3ca.notion.site/Deep-learning-specialization-b69a42ecb14446f39bd93fd0f15965d5