mle-interview
  • 面试指南
  • 数据结构与算法
    • 列表
      • 912. Sort an Array
      • 215. Kth Largest Element
      • 977. Squares of a Sorted Array
      • 605. Can Place Flowers
      • 59. Spiral Matrix II
      • 179. Largest Number
      • 31. Next Permutation
    • 二分查找
      • 704. Binary Search
      • 69. Sqrt(x)
      • 278. First Bad Version
      • 34. Find First and Last Position of Element in Sorted Array
      • 33. Search in Rotated Sorted Array
      • 81. Search in Rotated Sorted Array II
      • 162. Find Peak Element
      • 4. Median of Two Sorted Arrays
      • 1095. Find in Mountain Array
      • 240. Search a 2D Matrix II
      • 540. Single Element in a Sorted Array
      • 528. Random Pick with Weight
      • 1300. Sum of Mutated Array Closest to Target
      • 410. Split Array Largest Sum
      • 1044. Longest Duplicate Substring
      • *644. Maximum Average Subarray II
      • *1060. Missing Element in Sorted Array
      • *1062. Longest Repeating Substring
      • *1891. Cutting Ribbons
    • 双指针
      • 26. Remove Duplicate Numbers in Array
      • 283. Move Zeroes
      • 75. Sort Colors
      • 88. Merge Sorted Arrays
      • 167. Two Sum II - Input array is sorted
      • 11. Container With Most Water
      • 42. Trapping Rain Water
      • 15. 3Sum
      • 16. 3Sum Closest
      • 18. 4Sum
      • 454. 4Sum II
      • 409. Longest Palindrome
      • 125. Valid Palindrome
      • 647. Palindromic Substrings
      • 209. Minimum Size Subarray Sum
      • 5. Longest Palindromic Substring
      • 395. Longest Substring with At Least K Repeating Characters
      • 424. Longest Repeating Character Replacement
      • 76. Minimum Window Substring
      • 3. Longest Substring Without Repeating Characters
      • 1004. Max Consecutive Ones III
      • 1658. Minimum Operations to Reduce X to Zero
      • *277. Find the Celebrity
      • *340. Longest Substring with At Most K Distinct Characters
    • 链表
      • 203. Remove Linked List Elements
      • 19. Remove Nth Node From End of List
      • 876. Middle of the Linked List
      • 206. Reverse Linked List
      • 92. Reverse Linked List II
      • 24. Swap Nodes in Pairs
      • 707. Design Linked List
      • 148. Sort List
      • 160. Intersection of Two Linked Lists
      • 141. Linked List Cycle
      • 142. Linked List Cycle II
      • 328. Odd Even Linked List
    • 哈希表
      • 706. Design HashMap
      • 1. Two Sum
      • 146. LRU Cache
      • 128. Longest Consecutive Sequence
      • 73. Set Matrix Zeroes
      • 380. Insert Delete GetRandom O(1)
      • 49. Group Anagrams
      • 350. Intersection of Two Arrays II
      • 299. Bulls and Cows
      • *348. Design Tic-Tac-Toe
    • 字符串
      • 242. Valid Anagram
      • 151. Reverse Words in a String
      • 205. Isomorphic Strings
      • 647. Palindromic Substrings
      • 696. Count Binary Substrings
      • 28. Find the Index of the First Occurrence in a String
      • *186. Reverse Words in a String II
    • 栈与队列
      • 225. Implement Stack using Queues
      • 54. Spiral Matrix
      • 155. Min Stack
      • 232. Implement Queue using Stacks
      • 150. Evaluate Reverse Polish Notation
      • 224. Basic Calculator
      • 20. Valid Parentheses
      • 1472. Design Browser History
      • 1209. Remove All Adjacent Duplicates in String II
      • 1249. Minimum Remove to Make Valid Parentheses
      • *281. Zigzag Iterator
      • *1429. First Unique Number
      • *346. Moving Average from Data Stream
    • 优先队列/堆
      • 692. Top K Frequent Words
      • 347. Top K Frequent Elements
      • 973. K Closest Points
      • 23. Merge K Sorted Lists
      • 264. Ugly Number II
      • 378. Kth Smallest Element in a Sorted Matrix
      • 295. Find Median from Data Stream
      • 767. Reorganize String
      • 1438. Longest Continuous Subarray With Absolute Diff Less Than or Equal to Limit
      • 895. Maximum Frequency Stack
      • 1705. Maximum Number of Eaten Apples
      • *1086. High Five
    • 深度优先DFS
      • 二叉树
      • 543. Diameter of Binary Tree
      • 101. Symmetric Tree
      • 124. Binary Tree Maximum Path Sum
      • 226. Invert Binary Tree
      • 104. Maximum Depth of Binary Tree
      • 951. Flip Equivalent Binary Trees
      • 236. Lowest Common Ancestor of a Binary Tree
      • 987. Vertical Order Traversal of a Binary Tree
      • 572. Subtree of Another Tree
      • 863. All Nodes Distance K in Binary Tree
      • 1110. Delete Nodes And Return Forest
      • 230. Kth Smallest element in a BST
      • 98. Validate Binary Search Tree
      • 235. Lowest Common Ancestor of a Binary Search Tree
      • 669. Trim a Binary Search Tree
      • 700. Search in a Binary Search Tree
      • 108. Convert Sorted Array to Binary Search Tree
      • 450. Delete Node in a BST
      • 938. Range Sum of BST
      • *270. Closest Binary Search Tree Value
      • *333. Largest BST Subtree
      • *285. Inorder Successor in BST
      • *1485. Clone Binary Tree With Random Pointer
      • 回溯
      • 39. Combination Sum
      • 78. Subsets
      • 46. Permutation
      • 77. Combinations
      • 17. Letter Combinations of a Phone Number
      • 51. N-Queens
      • 93. Restore IP Addresses
      • 22. Generate Parentheses
      • 856. Score of Parentheses
      • 301. Remove Invalid Parentheses
      • 37. Sodoku Solver
      • 图DFS
      • 126. Word Ladder II
      • 212. Word Search II
      • 79. Word Search
      • 399. Evaluate Division
      • 1376. Time Needed to Inform All Employees
      • 131. Palindrome Partitioning
      • 491. Non-decreasing Subsequences
      • 698. Partition to K Equal Sum Subsets
      • 526. Beautiful Arrangement
      • 139. Word Break
      • 377. Combination Sum IV
      • 472. Concatenated Words
      • 403. Frog Jump
      • 329. Longest Increasing Path in a Matrix
      • 797. All Paths From Source to Target
      • 695. Max Area of Island
      • 341. Flatten Nested List Iterator
      • 394. Decode String
      • *291. Word Pattern II
      • *694. Number of Distinct Islands
      • *1274. Number of Ships in a Rectangle
      • *1087. Brace Expansion
    • 广度优先BFS
      • 102. Binary Tree Level Order Traversal
      • 103. Binary Tree Zigzag Level Order Traversal
      • 297. Serialize and Deserialize Binary Tree
      • 310. Minimum Height Trees
      • 127. Word Ladder
      • 934. Shortest Bridge
      • 200. Number of Islands
      • 133. Clone Graph
      • 130. Surrounded Regions
      • 752. Open the Lock
      • 815. Bus Routes
      • 1091. Shortest Path in Binary Matrix
      • 542. 01 Matrix
      • 1293. Shortest Path in a Grid with Obstacles Elimination
      • 417. Pacific Atlantic Water Flow
      • 207. Course Schedule
      • 210. Course Schedule II
      • 787. Cheapest Flights Within K Stops
      • 444. Sequence Reconstruction
      • 994. Rotting Oranges
      • 785. Is Graph Bipartite?
      • *366. Find Leaves of Binary Tree
      • *314. Binary Tree Vertical Order Traversal
      • *269. Alien Dictionary
      • *323. Connected Component in Undirected Graph
      • *490. The Maze
    • 动态规划
      • 70. Climbing Stairs
      • 72. Edit Distance
      • 377. Combination Sum IV
      • 1335. Minimum Difficulty of a Job Schedule
      • 97. Interleaving String
      • 472. Concatenated Words
      • 403. Frog Jump
      • 674. Longest Continuous Increasing Subsequence
      • 62. Unique Paths
      • 64. Minimum Path Sum
      • 368. Largest Divisible Subset
      • 300. Longest Increasing Subsequence
      • 354. Russian Doll Envelopes
      • 121. Best Time to Buy and Sell Stock
      • 132. Palindrome Partitioning II
      • 312. Burst Balloons
      • 1143. Longest Common Subsequence
      • 718. Maximum Length of Repeated Subarray
      • 174. Dungeon Game
      • 115. Distinct Subsequences
      • 91. Decode Ways
      • 639. Decode Ways II
      • 712. Minimum ASCII Delete Sum for Two Strings
      • 221. Maximal Square
      • 1277. Count Square Submatrices with All Ones
      • 198. House Robber
      • 213. House Robber II
      • 1235. Maximum Profit in Job Scheduling
      • 740. Delete and Earn
      • 87. Scramble String
      • 1140. Stone Game II
      • 322. Coin Change
      • 518. Coin Change II
      • 1048. Longest String Chain
      • 44. Wildcard Matching
      • 10. Regular Expression Matching
      • 32. Longest Valid Parentheses
      • 1043. Partition Array for Maximum Sum
      • *256. Paint House
      • 926. Flip String to Monotone Increasing
      • *1062. Longest Repeating Substring
      • *1216. Valid Palindrome III
    • 贪心
      • 56. Merge Intervals
      • 621. Task Scheduler
      • 135. Candy
      • 376. Wiggle Subsequence
      • 55. Jump Game
      • 134. Gas Station
      • 1005. Maximize Sum Of Array After K Negations
      • 406. Queue Reconstruction by Height
      • 452. Minimum Number of Arrows to Burst Balloons
      • 738. Monotone Increasing Digits
    • 单调栈
      • 739. Daily Temperatures
      • 503. Next Greater Element II
      • 901. Online Stock Span
      • 85. Maximum Rectangle
      • 84. Largest Rectangle in Histogram
      • 907. Sum of Subarray Minimums
      • 239. Sliding Window Maximum
    • 前缀和
      • 53. Maximum Subarray
      • 523. Continuous Subarray Sum
      • 304. Range Sum Query 2D - Immutable
      • 1423. Maximum Points You Can Obtain from Cards
      • 1031. Maximum Sum of Two Non-Overlapping Subarrays
    • 并查集
      • 684. Redundant Connection
      • 721. Accounts Merge
      • 547. Number of Provinces
      • 737. Sentence Similarity II
      • *305. Number of Islands II
    • 字典树trie
      • 208. Implement Trie
      • 211. Design Add and Search Words Data Structure
      • 1268. Search Suggestions System
      • *1166. Design File System
      • *642. Design Search Autocomplete System
    • 扫描线sweep line
      • 253. Meeting Room II
      • 1094. Car Pooling
      • 218. The Skyline Problem
      • *759. Employee Free Time
    • tree map
      • 729. My Calendar I
      • 981. Time Based Key-Value Store
      • 846. Hand of Straights
      • 480. Sliding Window Median
      • 318. Count of Smaller Numbers After Self
    • 数学类
      • 50. Pow(x, n)
      • *311. Sparse Matrix Multiplication
      • 382. Linked List Random Node
      • 398. Random Pick Index
      • 29. Divide Two Integers
    • 设计类
      • 1603. Design Parking System
      • 355. Design Twitter
      • 1396. Design Underground System
      • *359. Logger Rate Limiter
      • *353. Design Snake Game
      • *379. Design Phone Directory
      • *588. Design In-Memory File System
      • *1244. Design A Leaderboard
    • SQL
  • 机器学习
    • 数学基础
    • 评价指标
    • 线性回归
    • 逻辑回归
    • 树模型
    • 深度学习
    • 支持向量机
    • KNN
    • 无监督学习
    • k-means
    • 强化学习 RL
    • 自然语言处理 NLP
    • 大语言模型 LLM
    • 机器视觉 CV
    • 多模态 MM
    • 分布式机器学习
    • 推荐系统
    • 异常检测与风控
    • 模型解释性
    • 多任务学习
    • MLops
    • 特征工程
    • 在线学习
    • 硬件 cuda/triton
    • 产品case分析
    • 项目deep dive
    • 机器学习代码汇总
  • 系统设计
    • 面向对象设计
      • 电梯设计
      • 停车场设计
      • Unix文件系统设计
    • 系统设计
      • 设计社交网站Twitter
      • 设计视频网站Youtube
      • 短网址系统
      • 爬虫系统
      • 任务调度系统
      • 日志系统
      • 分布式缓存
      • 广告点击聚合系统
      • webhook
    • 机器学习系统设计
      • 推荐系统
      • 搜索引擎
      • Youtube视频推荐
      • Twitter推荐
      • 广告点击预测
      • 新闻推送推荐
      • POI推荐
      • Youtube视频搜索
      • 有害内容检测
      • 大模型RAG
      • 大模型Agent
      • 信贷风控
      • 朋友推荐
      • 去重复性/版权检测
      • 情感分析
      • 目标检测
      • 问答系统
      • 知识图谱问答
  • 行为面试
    • 领导力法则
    • 问答举例
  • 案例分享
    • 准备工作
    • 面试小抄
    • 面试之后
Powered by GitBook
On this page
  • 1. 概率统计
  • 中心极限定理 / Central Limit Theorem
  • 假设检验 / Hypothesis Testing
  • P值 / P-value
  • z检验
  • t检验/t-test
  • F-test
  • 2. 矩阵论
  • 特征值与特征向量 / Eigenvalue & Eigenvector
  • 迹 / Trace
  • 3. 微积分
  • Jacobian Matrix
  • Hessian Matrix
  • 4. 问答
  • Reference
  1. 机器学习

数学基础

Previous机器学习Next评价指标

Last updated 10 days ago

部分面试会直接考察统计与数学知识。即使不直接考察,在ML环节用数学佐证自己的观点也很有裨益

1. 概率统计

中心极限定理 / Central Limit Theorem

给定一个任意分布的总体,每次从总体中随机抽取n个样本,一共抽m次。然后把这m组抽样分别求平均值,这些平均值的分布接近正态分布。

其表示的是样本均值分布的特征。

假设检验 /

  • 通过样本来推测总体是否具备某种性质

  • 和最大似然类似?做出某个假设之后,依据其分布计算出,给出在这个分布下观察到这个现象的概率

P值 / P-value

  • 在假设原假设H0正确时,出现当前证据或更强的证据的概率

  • the p-value represents the probability of obtaining a test value, which is as extreme as the one which had been observed originally. The underlying condition is that the null hypothesis is true.

  • 均值对比的假设检验方法主要有Z检验和T检验,Z检验面向总体数据和大样本数据,而T检验适用于小规模抽样样本

  • t检验比z检验的普适性更强,z检验要求知道总体标准差,但实际研究中无法获知总体标准差,一般都会用t检验。且当样本量足够大的时候,数据接近正态分布,t检验几乎成为了z检验,z检验是t检验的一个特例

Bayes theorem

VIF

ANOVA

simpson paradox

蒙特卡洛

2. 矩阵论

特征值与特征向量 / Eigenvalue & Eigenvector

  • 矩阵理解为不同坐标系下的变换。几何上看,特征向量是矩阵变换后方向不变的向量,而特征值则是该向量在变换中被拉伸或压缩的比例。

迹 / Trace

  • 主对角线上的元素之和

  • 矩阵的迹与特征值之和有关

  • 协方差矩阵的迹是样本方差的和

3. 微积分

机器学习中使用的微积分主要用于优化和反向传播

Jacobian Matrix

First-Order Derivatives, for vector-valued functions (multiple outputs) with respect to multiple inputs.

Hessian Matrix

Second-Order Derivatives, for scalar-valued functions (one output) with respect to multiple inputs — second-order derivatives.

4. 问答

  • What is p-value? What is confidence interval? Explain them to a product manager or non-technical person.

  • How do you understand the "Power" of a statistical test?

  • If a distribution is right-skewed, what's the relationship between medium, mode, and mean?

  • When do you use T-test instead of Z-test? List some differences between these two.

  • Dice problem-1: How will you test if a coin is fair or not? How will you design the process(有时会要求编程实现)? what test would you use?

  • Dice problem-2: How to simulate a fair coin with one unfair coin?

  • 3 door questions.

  • Bayes Questions:Tom takes a cancer test and the test is advertised as being 99% accurate: if you have cancer you will test positive 99% of the time, and if you don't have cancer, you will test negative 99% of the time. If 1% of all people have cancer and Tom tests positive, what is the prob that Tom has the disease? (非常经典的cancer screen的题,做会这一道,其他都没问题了)

  • How do you calculate the sample size for an A/B testing?

    • 确定显著性水平 α 和统计功效 1−β,常见选择是0.05和0.8

  • If after running an A/B testing you find the fact that the desired metric(i.e, Click Through Rate) is going up while another metric is decreasing(i.e., Clicks). How would you make a decision?

  • Now assuming you have an A/B testing result reflecting your test result is kind of negative (i.e, p-value ~= 20%). How will you communicate with the product manager? If given the above 20% p-value, the product manager still decides to launch this new feature, how would you claim your suggestions and alerts?

  • 给定visitors and conversations,怎么计算significance

  • 什么是type I/II error

  • 圆周上任取三个点,能组成锐角三角形的概率是多大?

  • rejection sampling

  • Frequentists vs. Bayesians

    • One is called the frequentist interpretation. In this view, probabilities represent long run frequencies of events. For example, the above statement means that, if we flip the coin many times, we expect it to land heads about half the time.

    • The other interpretation is called the Bayesian interpretation of probability. In this view, probability is used to quantify our uncertainty about something; hence it is fundamentally related to information rather than repeated trials. In the Bayesian view, the above statement means we believe the coin is equally likely to land heads or tails on the next toss

    • One big advantage of the Bayesian interpretation is that it can be used to model our uncertainty about events that do not have long term frequencies. For example, we might want to compute the probability that the polar ice cap will melt by 2020 CE. This event will happen zero or one times, but cannot happen repeatedly. Nevertheless, we ought to be able to quantify our uncertainty about this event. To give another machine learning oriented example, we might have observed a “blip” on our radar screen, and want to compute the probability distribution over the location of the corresponding target (be it a bird, plane, or missile). In all these cases, the idea of repeated trials does not make sense, but the Bayesian interpretation is valid and indeed quite natural. We shall therefore adopt the Bayesian interpretation in this book. Fortunately, the basic rules of probability theory are the same, no matter which interpretation is adopted.

Reference

  • A practical guide to quantitative finance interviews

Hypothesis Testing
z检验
t检验/t-test
F-test
假设现有一枚均匀硬币,现要投掷硬币,直到其两次出现正面,求投掷的期望次数
https://brilliant.org/
https://github.com/kenjihiranabe/The-Art-of-Linear-Algebra
https://github.com/kojino/120-Data-Science-Interview-Questions
Statistics Interview Question and Answers
概率论——大数定律与中心极限定理
线性代数|机器学习 18.065 by Gilbert Strang
矩阵求导术
如何提供一个可信的AB测试解决方案
假设检验——这一篇文章就够了
马上去念CS PhD了,想补点数学课,请问有什么建议? - CKLSniper的回答 - 知乎
为什么分母从n变成n-1之后,就从【有偏估计】变成了【无偏估计】? - 包遵信的回答 - 知乎
如何通俗地理解协方差和相关系数? - 马同学的文章 - 知乎
P-Value
https://github.com/sinclam2/fifty-challenging-problems-in-probability