数学基础

1. 概率统计

有的面试会直接考察统计与数学知识。即使不是直接考察,在ML环节用数学佐证自己的观点是非常有裨益的。

  • 中心极限定理

    • 中心极限定理指的是给定一个任意分布的总体。每次从这些总体中随机抽取 n 个抽样,一共抽 m 次。 然后把这 m 组抽样分别求出平均值。 这些平均值的分布接近正态分布。

  • Hypothesis testing

    • 通过样本来推测总体是否具备某种性质

    • 和最大似然类似?做出某个假设之后,依据其分布计算出,给出在这个分布下观察到这个现象的概率

  • z检验

    • 均值对比的假设检验方法主要有Z检验和T检验,Z检验面向总体数据和大样本数据,而T检验适用于小规模抽样样本

  • t检验/t-test

    • t检验比z检验的普适性更强,z检验要求知道总体标准差,但实际研究中无法获知总体标准差,一般都会用t检验。且当样本量足够大的时候,数据接近正态分布,t检验几乎成为了z检验,z检验应该说t检验的一个特例

  • P-value

    • 在假设原假设H0正确时,出现当前证据或更强的证据的概率

  • confidence interval

  • correlation matrix

  • VIF

  • R2/ adjusted R2

  • ANOVA

  • 蒙特卡洛

  • 独立同分布IID

    • 机器学习领域的重要假设

2. 矩阵

特征值与特征向量

迹 trace

  • 主对角线上的元素之和

  • 矩阵的迹与特征值之和有关

  • 协方差矩阵的迹是样本方差的和

3. 微积分

机器学习中使用的微积分主要在于优化。

4. 问答

  • a/b testing如何确定sample size

  • What is p-value? What is confidence interval? Explain them to a product manager or non-technical person.

  • How do you understand the "Power" of a statistical test?

  • If a distribution is right-skewed, what's the relationship between medium, mode, and mean?

  • When do you use T-test instead of Z-test? List some differences between these two.

  • Dice problem-1: How will you test if a coin is fair or not? How will you design the process(有时会要求编程实现)? what test would you use?

  • Dice problem-2: How to simulate a fair coin with one unfair coin?

  • 3 door questions.

  • Bayes Questions:Tom takes a cancer test and the test is advertised as being 99% accurate: if you have cancer you will test positive 99% of the time, and if you don't have cancer, you will test negative 99% of the time. If 1% of all people have cancer and Tom tests positive, what is the prob that Tom has the disease? (非常经典的cancer screen的题,做会这一道,其他都没问题了)

  • How do you calculate the sample size for an A/B testing?

  • If after running an A/B testing you find the fact that the desired metric(i.e, Click Through Rate) is going up while another metric is decreasing(i.e., Clicks). How would you make a decision?

  • Now assuming you have an A/B testing result reflecting your test result is kind of negative (i.e, p-value ~= 20%). How will you communicate with the product manager? If given the above 20% p-value, the product manager still decides to launch this new feature, how would you claim your suggestions and alerts?

  • 给你一些visitors and conversations,怎么计算significance

  • 什么是type I/II error

  • 圆周上任取三个点,能组成锐角三角形的概率是多大?

  • rejection sampling

Reference

Last updated