mle-interview
  • 面试指南
  • 数据结构与算法
    • 列表
      • 912. Sort an Array
      • 215. Kth Largest Element
      • 977. Squares of a Sorted Array
      • 605. Can Place Flowers
      • 59. Spiral Matrix II
      • 179. Largest Number
      • 31. Next Permutation
    • 二分查找
      • 704. Binary Search
      • 69. Sqrt(x)
      • 278. First Bad Version
      • 34. Find First and Last Position of Element in Sorted Array
      • 33. Search in Rotated Sorted Array
      • 81. Search in Rotated Sorted Array II
      • 162. Find Peak Element
      • 4. Median of Two Sorted Arrays
      • 1095. Find in Mountain Array
      • 240. Search a 2D Matrix II
      • 540. Single Element in a Sorted Array
      • 528. Random Pick with Weight
      • 1300. Sum of Mutated Array Closest to Target
      • 410. Split Array Largest Sum
      • 1044. Longest Duplicate Substring
      • *644. Maximum Average Subarray II
      • *1060. Missing Element in Sorted Array
      • *1062. Longest Repeating Substring
      • *1891. Cutting Ribbons
    • 双指针
      • 26. Remove Duplicate Numbers in Array
      • 283. Move Zeroes
      • 75. Sort Colors
      • 88. Merge Sorted Arrays
      • 167. Two Sum II - Input array is sorted
      • 11. Container With Most Water
      • 42. Trapping Rain Water
      • 15. 3Sum
      • 16. 3Sum Closest
      • 18. 4Sum
      • 454. 4Sum II
      • 409. Longest Palindrome
      • 125. Valid Palindrome
      • 647. Palindromic Substrings
      • 209. Minimum Size Subarray Sum
      • 5. Longest Palindromic Substring
      • 395. Longest Substring with At Least K Repeating Characters
      • 424. Longest Repeating Character Replacement
      • 76. Minimum Window Substring
      • 3. Longest Substring Without Repeating Characters
      • 1004. Max Consecutive Ones III
      • 1658. Minimum Operations to Reduce X to Zero
      • *277. Find the Celebrity
      • *340. Longest Substring with At Most K Distinct Characters
    • 链表
      • 203. Remove Linked List Elements
      • 19. Remove Nth Node From End of List
      • 876. Middle of the Linked List
      • 206. Reverse Linked List
      • 92. Reverse Linked List II
      • 24. Swap Nodes in Pairs
      • 707. Design Linked List
      • 148. Sort List
      • 160. Intersection of Two Linked Lists
      • 141. Linked List Cycle
      • 142. Linked List Cycle II
      • 328. Odd Even Linked List
    • 哈希表
      • 706. Design HashMap
      • 1. Two Sum
      • 146. LRU Cache
      • 128. Longest Consecutive Sequence
      • 73. Set Matrix Zeroes
      • 380. Insert Delete GetRandom O(1)
      • 49. Group Anagrams
      • 350. Intersection of Two Arrays II
      • 299. Bulls and Cows
      • *348. Design Tic-Tac-Toe
    • 字符串
      • 242. Valid Anagram
      • 151. Reverse Words in a String
      • 205. Isomorphic Strings
      • 647. Palindromic Substrings
      • 696. Count Binary Substrings
      • 28. Find the Index of the First Occurrence in a String
      • *186. Reverse Words in a String II
    • 栈与队列
      • 225. Implement Stack using Queues
      • 54. Spiral Matrix
      • 155. Min Stack
      • 232. Implement Queue using Stacks
      • 150. Evaluate Reverse Polish Notation
      • 224. Basic Calculator
      • 20. Valid Parentheses
      • 1472. Design Browser History
      • 1209. Remove All Adjacent Duplicates in String II
      • 1249. Minimum Remove to Make Valid Parentheses
      • *281. Zigzag Iterator
      • *1429. First Unique Number
      • *346. Moving Average from Data Stream
    • 优先队列/堆
      • 692. Top K Frequent Words
      • 347. Top K Frequent Elements
      • 973. K Closest Points
      • 23. Merge K Sorted Lists
      • 264. Ugly Number II
      • 378. Kth Smallest Element in a Sorted Matrix
      • 295. Find Median from Data Stream
      • 767. Reorganize String
      • 1438. Longest Continuous Subarray With Absolute Diff Less Than or Equal to Limit
      • 895. Maximum Frequency Stack
      • 1705. Maximum Number of Eaten Apples
      • *1086. High Five
    • 深度优先DFS
      • 二叉树
      • 543. Diameter of Binary Tree
      • 101. Symmetric Tree
      • 124. Binary Tree Maximum Path Sum
      • 226. Invert Binary Tree
      • 104. Maximum Depth of Binary Tree
      • 951. Flip Equivalent Binary Trees
      • 236. Lowest Common Ancestor of a Binary Tree
      • 987. Vertical Order Traversal of a Binary Tree
      • 572. Subtree of Another Tree
      • 863. All Nodes Distance K in Binary Tree
      • 1110. Delete Nodes And Return Forest
      • 230. Kth Smallest element in a BST
      • 98. Validate Binary Search Tree
      • 235. Lowest Common Ancestor of a Binary Search Tree
      • 669. Trim a Binary Search Tree
      • 700. Search in a Binary Search Tree
      • 108. Convert Sorted Array to Binary Search Tree
      • 450. Delete Node in a BST
      • 938. Range Sum of BST
      • *270. Closest Binary Search Tree Value
      • *333. Largest BST Subtree
      • *285. Inorder Successor in BST
      • *1485. Clone Binary Tree With Random Pointer
      • 回溯
      • 39. Combination Sum
      • 78. Subsets
      • 46. Permutation
      • 77. Combinations
      • 17. Letter Combinations of a Phone Number
      • 51. N-Queens
      • 93. Restore IP Addresses
      • 22. Generate Parentheses
      • 856. Score of Parentheses
      • 301. Remove Invalid Parentheses
      • 37. Sodoku Solver
      • 图DFS
      • 126. Word Ladder II
      • 212. Word Search II
      • 79. Word Search
      • 399. Evaluate Division
      • 1376. Time Needed to Inform All Employees
      • 131. Palindrome Partitioning
      • 491. Non-decreasing Subsequences
      • 698. Partition to K Equal Sum Subsets
      • 526. Beautiful Arrangement
      • 139. Word Break
      • 377. Combination Sum IV
      • 472. Concatenated Words
      • 403. Frog Jump
      • 329. Longest Increasing Path in a Matrix
      • 797. All Paths From Source to Target
      • 695. Max Area of Island
      • 341. Flatten Nested List Iterator
      • 394. Decode String
      • *291. Word Pattern II
      • *694. Number of Distinct Islands
      • *1274. Number of Ships in a Rectangle
      • *1087. Brace Expansion
    • 广度优先BFS
      • 102. Binary Tree Level Order Traversal
      • 103. Binary Tree Zigzag Level Order Traversal
      • 297. Serialize and Deserialize Binary Tree
      • 310. Minimum Height Trees
      • 127. Word Ladder
      • 934. Shortest Bridge
      • 200. Number of Islands
      • 133. Clone Graph
      • 130. Surrounded Regions
      • 752. Open the Lock
      • 815. Bus Routes
      • 1091. Shortest Path in Binary Matrix
      • 542. 01 Matrix
      • 1293. Shortest Path in a Grid with Obstacles Elimination
      • 417. Pacific Atlantic Water Flow
      • 207. Course Schedule
      • 210. Course Schedule II
      • 787. Cheapest Flights Within K Stops
      • 444. Sequence Reconstruction
      • 994. Rotting Oranges
      • 785. Is Graph Bipartite?
      • *366. Find Leaves of Binary Tree
      • *314. Binary Tree Vertical Order Traversal
      • *269. Alien Dictionary
      • *323. Connected Component in Undirected Graph
      • *490. The Maze
    • 动态规划
      • 70. Climbing Stairs
      • 72. Edit Distance
      • 377. Combination Sum IV
      • 1335. Minimum Difficulty of a Job Schedule
      • 97. Interleaving String
      • 472. Concatenated Words
      • 403. Frog Jump
      • 674. Longest Continuous Increasing Subsequence
      • 62. Unique Paths
      • 64. Minimum Path Sum
      • 368. Largest Divisible Subset
      • 300. Longest Increasing Subsequence
      • 354. Russian Doll Envelopes
      • 121. Best Time to Buy and Sell Stock
      • 132. Palindrome Partitioning II
      • 312. Burst Balloons
      • 1143. Longest Common Subsequence
      • 718. Maximum Length of Repeated Subarray
      • 174. Dungeon Game
      • 115. Distinct Subsequences
      • 91. Decode Ways
      • 639. Decode Ways II
      • 712. Minimum ASCII Delete Sum for Two Strings
      • 221. Maximal Square
      • 1277. Count Square Submatrices with All Ones
      • 198. House Robber
      • 213. House Robber II
      • 1235. Maximum Profit in Job Scheduling
      • 740. Delete and Earn
      • 87. Scramble String
      • 1140. Stone Game II
      • 322. Coin Change
      • 518. Coin Change II
      • 1048. Longest String Chain
      • 44. Wildcard Matching
      • 10. Regular Expression Matching
      • 32. Longest Valid Parentheses
      • 1043. Partition Array for Maximum Sum
      • *256. Paint House
      • 926. Flip String to Monotone Increasing
      • *1062. Longest Repeating Substring
      • *1216. Valid Palindrome III
    • 贪心
      • 56. Merge Intervals
      • 621. Task Scheduler
      • 135. Candy
      • 376. Wiggle Subsequence
      • 55. Jump Game
      • 134. Gas Station
      • 1005. Maximize Sum Of Array After K Negations
      • 406. Queue Reconstruction by Height
      • 452. Minimum Number of Arrows to Burst Balloons
      • 738. Monotone Increasing Digits
    • 单调栈
      • 739. Daily Temperatures
      • 503. Next Greater Element II
      • 901. Online Stock Span
      • 85. Maximum Rectangle
      • 84. Largest Rectangle in Histogram
      • 907. Sum of Subarray Minimums
      • 239. Sliding Window Maximum
    • 前缀和
      • 53. Maximum Subarray
      • 523. Continuous Subarray Sum
      • 304. Range Sum Query 2D - Immutable
      • 1423. Maximum Points You Can Obtain from Cards
      • 1031. Maximum Sum of Two Non-Overlapping Subarrays
    • 并查集
      • 684. Redundant Connection
      • 721. Accounts Merge
      • 547. Number of Provinces
      • 737. Sentence Similarity II
      • *305. Number of Islands II
    • 字典树trie
      • 208. Implement Trie
      • 211. Design Add and Search Words Data Structure
      • 1268. Search Suggestions System
      • *1166. Design File System
      • *642. Design Search Autocomplete System
    • 扫描线sweep line
      • 253. Meeting Room II
      • 1094. Car Pooling
      • 218. The Skyline Problem
      • *759. Employee Free Time
    • tree map
      • 729. My Calendar I
      • 981. Time Based Key-Value Store
      • 846. Hand of Straights
      • 480. Sliding Window Median
      • 318. Count of Smaller Numbers After Self
    • 数学类
      • 50. Pow(x, n)
      • *311. Sparse Matrix Multiplication
      • 382. Linked List Random Node
      • 398. Random Pick Index
      • 29. Divide Two Integers
    • 设计类
      • 1603. Design Parking System
      • 355. Design Twitter
      • 1396. Design Underground System
      • *359. Logger Rate Limiter
      • *353. Design Snake Game
      • *379. Design Phone Directory
      • *588. Design In-Memory File System
      • *1244. Design A Leaderboard
    • SQL
  • 机器学习
    • 数学基础
    • 评价指标
    • 线性回归
    • 逻辑回归
    • 树模型
    • 深度学习
    • 支持向量机
    • KNN
    • 无监督学习
    • k-means
    • 强化学习 RL
    • 自然语言处理 NLP
    • 大语言模型 LLM
    • 机器视觉 CV
    • 多模态 MM
    • 分布式机器学习
    • 推荐系统
    • 异常检测与风控
    • 模型解释性
    • 多任务学习
    • MLops
    • 特征工程
    • 在线学习
    • 硬件 cuda/triton
    • 产品case分析
    • 项目deep dive
    • 机器学习代码汇总
  • 系统设计
    • 面向对象设计
      • 电梯设计
      • 停车场设计
      • Unix文件系统设计
    • 系统设计
      • 设计社交网站Twitter
      • 设计视频网站Youtube
      • 短网址系统
      • 爬虫系统
      • 任务调度系统
      • 日志系统
      • 分布式缓存
      • 广告点击聚合系统
      • webhook
    • 机器学习系统设计
      • 推荐系统
      • 搜索引擎
      • Youtube视频推荐
      • Twitter推荐
      • 广告点击预测
      • 新闻推送推荐
      • POI推荐
      • Youtube视频搜索
      • 有害内容检测
      • 大模型RAG
      • 大模型Agent
      • 信贷风控
      • 朋友推荐
      • 去重复性/版权检测
      • 情感分析
      • 目标检测
      • 问答系统
      • 知识图谱问答
  • 行为面试
    • 领导力法则
    • 问答举例
  • 案例分享
    • 准备工作
    • 面试小抄
    • 面试之后
Powered by GitBook
On this page
  • 1. 面试过程
  • 2. 回答框架
  • 3. 面试实例
  • 4. 常见问答
  • 参考
  1. 系统设计

机器学习系统设计

机器学习系统的核心,是训练一个模型来解决商业任务,如预测、分类、排序

  • 建模design:包括优化目标,feature,data,模型结构,评价标准等

  • 系统design:偏重于在线serve模型,包括feature store, ANN, ETL pipeline, MLOps等

1. 面试过程

  • 心态和神态: 自信大方的展现自己的能力,良好的沟通是任何面试都看重的

  • 沟通: 一边白板画框图,一边告知面试官要讲某几个部分。每个部分move前可再次确认 Is there anywhere that you feel I missed?

  • 分层思维: 过程中,一层主题讲清楚前,不要陷入任何一部分的细节挖掘。随着问题介绍,data和细节都会明确

  • 深度和广度: 讲到具体部分,尤其是自己熟悉的方面,要主动讲,积极展现自己的知识宽度和深度

  • trade-off: 不要对需求和场景做主观假设,不熟悉的场景,一定先从头到尾问清楚细节,讲清楚trade-off是能力的重要体现。trade-off可以从从业务(比如预测准确性、长尾预测准确、冷启动效果)和技术角度(scale、latency)出发

2. 回答框架

  • 明确需求 Requirement

    • functional和non-functional一定确认清楚,否则是明显不合格signal。面试者提的问题往往就能看出水平

    • 场景,功能,目标(engagement or revenue, project goal, project metrics),约束

    • scale of the system, user和item有哪些数据和量级

  • 机器学习任务 ML Task

    • 解释如何将需求转化为机器学习问题(如推荐转化为二分类模型和原因)

  • 数据 Data

    • identify data:training + label, testing + ground truth

    • 获取label: 从交互中收集,人工标注,人工标注加无监督辅助,增强数据

    • 分类任务的positive & negative label

    • 一些可做特征的数据是否有log

    • 数据探讨: bias, 非均衡, label质量

    • GDPR/privacy: 数据脱敏,数据加密

    • train/test data和product上distribution不一样怎么办,data distribution随时间改变怎么办

  • 特征 Feature

    • user, item and cross, context

    • sparse and dense feature

    • 每个ML组都有不同的embedding set。互相用别人的embedding set,怎么pre-train,fine-train,怎么combine feature非常重要

    • feature的AB test怎么做?不同traffic

  • 模型 Model

    • 总是从简单baseline开始

    • 每个design的选择,像平时写design doc一样比较不同选项的优劣

    • 模型选择,考虑来自系统的constraint。比如prediction latency,memory。怎么合理的牺牲模型的性能以换取constraint方面的benefit

    • 大多数场景,模型之外都需要额外的策略兜底

  • 评价 Evaluation

    • offline and online

    • AB testing

    • 模型的评价,比如:点击,转化,是否有广告?考察的是GMV,还是转化订单?

  • 部署 Deployment

    • server or device

    • all users or a part of users

    • statically, dynamically(server or device) or model streaming

  • 服务 serving

    • batch prediction or online prediction

  • 监控 monitoring

    • 监控latency,QPS,precision,recall等

    • Grafana, prometheus

  • 维护 maintain

    • retrain strategy

    • 全量训练 + 增量训练

3. 面试实例

  • design a monitoring system to realtime measure ML models, including features, score distribution, qps

  • abusive user detection

  • predict app install

业务目标

  • improve engagement on a feed

  • improve customer churn

  • return items from search engine query

  • cold-start/ position bias/ diversity

  • multiple task

4. 常见问答

  • how to scale

    • Scaling general SW system (distributed servers, load balancer, sharding, replication, caching)

    • Train data / KB partitioning

    • Distributed ML

    • Data parallelism (for training)

    • Model parallelism (for training, inference)

    • Distributed training

      • Asynchronous SGD

      • Synchronous SGD

    • Data parallel DT, RPC based DT

    • Scaling data collection

    • machine translation for 1000 languages

      • NLLB

  • Auto ML (soft: HP tuning, hard: arch search (NAS))

  • 线上线下不一致

  • 不同的数据用什么方式存储

  • data pipeline怎么设计

  • deploy

    • 负载均衡和自动伸缩

    • latency如何优化

    • 这么多server如何deploy,以及如何push新的model version,在更新的时候如何保证qps不degrade

  • serving

    • model serving是典型的low latency high qps

    • Online A/B testing

      • Based on online metrics we would select a significance level 𝛼 and power threshold 1 – 𝛽

      • Calculate the required sample size per variation: The required sample size depends on 𝛼, 𝛽, and the MDE Minimum Detectable Effect – the target relative minimum increase over the baseline that should be observed from a test

      • Randomly assign users into control and treatment groups (discuss with the interviewer whether we will split the candidates on the user level or the request level)

      • Measure and analyze results using the appropriate test. Also, we should ensure that the model does not have any biases.

    • If we are serving batch features they have to be handled offline and served at real time so we have to have daily/weekly jobs for generating this data.

    • If we are serving real time features then they need to be fetched/derived at request time and we need to be aware of scalability or latency issues (load balancing), we may need to create a feature store to lookup features at serve time and maybe some caching depending on the use case.

    • Where to run inference: if we run the model on the user’s phone/computer then it would use their memory/battery but latency would be quick, on the other hand, if we store the model on our own service we increase latency and privacy concerns but removes the burden of taking up memory and battery on the user’s device.

    • how often we would retrain the model. Some models need to be retrained every day, some every week and others monthly/yearly. Always discuss the pros and cons of the retraining regime you choose

  • Monitoring Performance

    • Latency (P99 latency every X minutes)

    • Biases and misuses of your model

    • Performance Drop

    • Data Drift

    • concept drift: spam detection

    • CPU load

    • Memory Usage

参考

精读

扩展

PreviouswebhookNext推荐系统

Last updated 1 month ago

youtube recommendation
search box
design youtube violent content detection system
auto suggestion
推荐系统embedding-> Deep Hash Embedding
推荐系统有哪些坑?
ML Systems Design Interview Guide
Meet Michelangelo: Uber’s Machine Learning Platform
Machine Learning Engineering by Andriy Burkov
https://github.com/chiphuyen/machine-learning-systems-design
https://github.com/alirezadir/Machine-Learning-Interviews/blob/main/src/MLSD/ml-system-design.md
https://github.com/ByteByteGoHq/ml-bytebytego
https://research.facebook.com/blog/2018/5/the-facebook-field-guide-to-machine-learning-video-series/
https://github.com/khangich/machine-learning-interview
https://github.com/shibuiwilliam/ml-system-in-actions
https://github.com/mercari/ml-system-design-pattern
https://github.com/ibragim-bad/machine-learning-design-primer
Grokking the Machine Learning Interview
https://about.instagram.com/blog/engineering/designing-a-constrained-exploration-system
https://www.educative.io/courses/grokking-the-machine-learning-interview
https://www.youtube.com/c/BitTiger
ML system 入坑指南 - Fazzie的文章 - 知乎
模型生产环境中的反馈与数据回流 - 想飞的石头的文章 - 知乎
https://www.1point3acres.com/bbs/thread-901192-1-1.html
kuhung/machine-learning-systems-design
ML design 面试的答题模板,step by step-1point3acres
30+公司 MLE 面试准备经验分享-1point3acres
AI system
Guideline google cloud
从大公司的博客里学最新机器学习
浅谈ML Design推荐系统面试心得, ask me anything
CS294: AI for Systems and Systems for AI
CSE 599W: Systems for ML
https://github.com/microsoft/AI-System
https://github.com/eugeneyan/ml-design-docs
https://www.machinelearninginterviews.com/ml-design-template/
https://github.com/Doragd/Algorithm-Practice-in-Industry
买它 MLE E6 昂赛过经
https://www.evidentlyai.com/ml-system-design
https://www.infoq.com/machinelearning/