机器学习
平时注意构建知识体系,通过读论文和做实验不断为自己的知识体系添砖加瓦
重要概念
归纳偏置(Inductive Bias)
数据分布 IID
面试要求
熟悉常见模型的原理、代码、如何实际应用、优缺点、常见面试问题等
考察范围包括ML breadth, ML depth, ML application, coding
可能持续被追问为什么? 如为什么某个trick能起作用?
算法背后的数学原理,写出其主要数学公式,并能进行白板推导
一些较新的领域,会考察论文细节
每一个算法的scale, 如何将算法map-reduce化
每一个算法的复杂度、参数量、计算量
实例
手写基础算法,以及一些优化的follow up
写实现两层fully connected网络
手写softmax的backpropagation
手写AUC
手写SGD
实现dropout,前向和后向
random sample with weights
实现focal loss
手写n-gram
手写multi head attention
视觉:手写iou/nms
NLP: 手写tokenizer
延伸
给一个LSTM network的结构,计算how many parameters
convolution layer的output size怎么算? 写出公式
设计一个sparse matrix (包括加减乘等运算)
怎么解决nn的 over-fitting/ under-fitting
过拟合:
从数据角度,收集更多训练数据。求其次的话,数据增强方法。
降低模型复杂度,如神经网络中的层数、宽度,树模型中的树深度、剪枝。模型正则化方法,如正则约束L2。集成学习方法,bagging方法。
Cross-validation to detect over-fitting.
Train with more data.
Data augmentation.
Feature selection.
Early stop.
Regularization.
Ensemble methods.
Pretrained model
欠拟合:
增加新特征,增加模型复杂度,减少正则化系数。
训练模型的第一步就是要保证能够过拟合。
怎么解决样本不平衡问题
如果是classification,data是long tail的,只是取头部80%的label,其他的label不要了,mark as others
如果真的特别imbalance,99.99% 和0.01%,类似spam的情况。 就只能试试别的方法,outlier detection之类
最后继续引申到样本的难易
评价指标:AP(average_precision_score)
downsampling
faster convergence, save disk space, calibration(=upweight?)
upweight
every sample contribute the loss equality
怎么解决数据缺失的问题
怎么解决类别变量中的高基数特征 high-cardinality
优化器,如何选择优化器
MSE, loglikelihood+GD
SGD-training data太大量
ADAM-sparse input
数据收集
production data, label
Internet dataset
分布不一致怎么解决
distribution不是特别指的feature的,也有label的。label只能说多收集data,还是balance data的问题。
data distribution 改变,就是做auto train, auto deploy.如果参数drop太多,只能人工干预重新训练
推荐,scale\abtesting\trouble-shooting
怎么提升模型的latency
小模型
知识蒸馏
squeeze model to 8bit or 4bit
Generative vs Discriminative
A generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data.
Discriminative models will generally outperform generative models on classification tasks. Discriminative model learns the predictive distribution p(y|x) directly while generative model learns the joint distribution p(x, y) then obtains the predictive distribution based on Bayes' rule.
The bias-variance tradeoff is a central problem in supervised learning
Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data. Unfortunately, it is typically impossible to do both simultaneously.
High-variance learning methods may be able to represent their training set well but are at risk of overfitting to noisy or unrepresentative training data.
In contrast, algorithms with high bias typically produce simpler models that don't tend to overfit but may underfit their training data, failing to capture important regularities.
模型的并行
线性/逻辑回归
xgboost
cnn
RNN
transformer
在深度学习框架中,单个张量的乘法内部会自动并行
参考
Last updated