数学基础
Last updated
Last updated
部分面试会直接考察统计与数学知识。即使不直接考察,在ML环节用数学佐证自己的观点也很有裨益
给定一个任意分布的总体,每次从总体中随机抽取n个样本,一共抽m次。然后把这m组抽样分别求平均值,这些平均值的分布接近正态分布。
其表示的是样本均值分布的特征。
通过样本来推测总体是否具备某种性质
和最大似然类似?做出某个假设之后,依据其分布计算出,给出在这个分布下观察到这个现象的概率
在假设原假设H0正确时,出现当前证据或更强的证据的概率
the p-value represents the probability of obtaining a test value, which is as extreme as the one which had been observed originally. The underlying condition is that the null hypothesis is true.
均值对比的假设检验方法主要有Z检验和T检验,Z检验面向总体数据和大样本数据,而T检验适用于小规模抽样样本
t检验比z检验的普适性更强,z检验要求知道总体标准差,但实际研究中无法获知总体标准差,一般都会用t检验。且当样本量足够大的时候,数据接近正态分布,t检验几乎成为了z检验,z检验是t检验的一个特例
Bayes theorem
VIF
ANOVA
simpson paradox
蒙特卡洛
矩阵理解为不同坐标系下的变换。几何上看,特征向量是矩阵变换后方向不变的向量,而特征值则是该向量在变换中被拉伸或压缩的比例。
主对角线上的元素之和
矩阵的迹与特征值之和有关
协方差矩阵的迹是样本方差的和
机器学习中使用的微积分主要用于优化和反向传播
First-Order Derivatives, for vector-valued functions (multiple outputs) with respect to multiple inputs.
Second-Order Derivatives, for scalar-valued functions (one output) with respect to multiple inputs — second-order derivatives.
How to check Gradient Decent for convergence:
Plot the loss function for each iteration - Learning curve
The curve decreased quickly at early stage of iterations and goes flatly, indicating likely converged.
(Alternative) Automatic convergence test
Identify problems with Gradient Decent:
Check the learning curve of loss fuction, if the shape is fluctuant or going up, it indicates a bug or the learning rate $\alpha$ is too large.
How to optimize Gradient Decent computing:
Feature scaling:
Max normalization: $\frac{x}{\text{max}}$
Mean normalization: $\frac{x - \mu}{\text{max} - \text{min}}$
Z-score normalization: $\frac{x - \mu}{\sigma}$
Find range of learning rate $\alpha$ - Try differnet values of $\alpha$ step by step:
first 0.001, then 0.003
next 0.01, then 0.03
next 0.1, then 0.3
next 1 ...
Choose the largest learning rate or the one that's slightly smaller than the largest
What is p-value? What is confidence interval? Explain them to a product manager or non-technical person.
Answer:
How do you understand the "Power" of a statistical test?
If a distribution is right-skewed, what's the relationship between medium, mode, and mean?
When do you use T-test instead of Z-test? List some differences between these two.
Dice problem-1: How will you test if a coin is fair or not? How will you design the process(有时会要求编程实现)? what test would you use?
Dice problem-2: How to simulate a fair coin with one unfair coin?
3 door questions.
Bayes Questions:Tom takes a cancer test and the test is advertised as being 99% accurate: if you have cancer you will test positive 99% of the time, and if you don't have cancer, you will test negative 99% of the time. If 1% of all people have cancer and Tom tests positive, what is the prob that Tom has the disease? (非常经典的cancer screen的题,做会这一道,其他都没问题了)
How do you calculate the sample size for an A/B testing?
确定显著性水平 α 和统计功效 1−β,常见选择是0.05和0.8
If after running an A/B testing you find the fact that the desired metric(i.e, Click Through Rate) is going up while another metric is decreasing(i.e., Clicks). How would you make a decision?
Now assuming you have an A/B testing result reflecting your test result is kind of negative (i.e, p-value ~= 20%). How will you communicate with the product manager? If given the above 20% p-value, the product manager still decides to launch this new feature, how would you claim your suggestions and alerts?
给定visitors and conversations,怎么计算significance
什么是type I/II error
圆周上任取三个点,能组成锐角三角形的概率是多大?
rejection sampling
Frequentists vs. Bayesians
One is called the frequentist interpretation. In this view, probabilities represent long run frequencies of events. For example, the above statement means that, if we flip the coin many times, we expect it to land heads about half the time.
The other interpretation is called the Bayesian interpretation of probability. In this view, probability is used to quantify our uncertainty about something; hence it is fundamentally related to information rather than repeated trials. In the Bayesian view, the above statement means we believe the coin is equally likely to land heads or tails on the next toss
One big advantage of the Bayesian interpretation is that it can be used to model our uncertainty about events that do not have long term frequencies. For example, we might want to compute the probability that the polar ice cap will melt by 2020 CE. This event will happen zero or one times, but cannot happen repeatedly. Nevertheless, we ought to be able to quantify our uncertainty about this event. To give another machine learning oriented example, we might have observed a “blip” on our radar screen, and want to compute the probability distribution over the location of the corresponding target (be it a bird, plane, or missile). In all these cases, the idea of repeated trials does not make sense, but the Bayesian interpretation is valid and indeed quite natural. We shall therefore adopt the Bayesian interpretation in this book. Fortunately, the basic rules of probability theory are the same, no matter which interpretation is adopted.
A practical guide to quantitative finance interviews