mle-interview
  • 面试指南
  • 数据结构与算法
    • 列表
      • 912. Sort an Array
      • 215. Kth Largest Element
      • 977. Squares of a Sorted Array
      • 605. Can Place Flowers
      • 59. Spiral Matrix II
      • 179. Largest Number
      • 31. Next Permutation
    • 二分查找
      • 704. Binary Search
      • 69. Sqrt(x)
      • 278. First Bad Version
      • 34. Find First and Last Position of Element in Sorted Array
      • 33. Search in Rotated Sorted Array
      • 81. Search in Rotated Sorted Array II
      • 162. Find Peak Element
      • 4. Median of Two Sorted Arrays
      • 1095. Find in Mountain Array
      • 240. Search a 2D Matrix II
      • 540. Single Element in a Sorted Array
      • 528. Random Pick with Weight
      • 1300. Sum of Mutated Array Closest to Target
      • 410. Split Array Largest Sum
      • 1044. Longest Duplicate Substring
      • *644. Maximum Average Subarray II
      • *1060. Missing Element in Sorted Array
      • *1062. Longest Repeating Substring
      • *1891. Cutting Ribbons
    • 双指针
      • 26. Remove Duplicate Numbers in Array
      • 283. Move Zeroes
      • 75. Sort Colors
      • 88. Merge Sorted Arrays
      • 167. Two Sum II - Input array is sorted
      • 11. Container With Most Water
      • 42. Trapping Rain Water
      • 15. 3Sum
      • 16. 3Sum Closest
      • 18. 4Sum
      • 454. 4Sum II
      • 409. Longest Palindrome
      • 125. Valid Palindrome
      • 647. Palindromic Substrings
      • 209. Minimum Size Subarray Sum
      • 5. Longest Palindromic Substring
      • 395. Longest Substring with At Least K Repeating Characters
      • 424. Longest Repeating Character Replacement
      • 76. Minimum Window Substring
      • 3. Longest Substring Without Repeating Characters
      • 1004. Max Consecutive Ones III
      • 1658. Minimum Operations to Reduce X to Zero
      • *277. Find the Celebrity
      • *340. Longest Substring with At Most K Distinct Characters
    • 链表
      • 203. Remove Linked List Elements
      • 19. Remove Nth Node From End of List
      • 876. Middle of the Linked List
      • 206. Reverse Linked List
      • 92. Reverse Linked List II
      • 24. Swap Nodes in Pairs
      • 707. Design Linked List
      • 148. Sort List
      • 160. Intersection of Two Linked Lists
      • 141. Linked List Cycle
      • 142. Linked List Cycle II
      • 328. Odd Even Linked List
    • 哈希表
      • 706. Design HashMap
      • 1. Two Sum
      • 146. LRU Cache
      • 128. Longest Consecutive Sequence
      • 73. Set Matrix Zeroes
      • 380. Insert Delete GetRandom O(1)
      • 49. Group Anagrams
      • 350. Intersection of Two Arrays II
      • 299. Bulls and Cows
      • *348. Design Tic-Tac-Toe
    • 字符串
      • 242. Valid Anagram
      • 151. Reverse Words in a String
      • 205. Isomorphic Strings
      • 647. Palindromic Substrings
      • 696. Count Binary Substrings
      • 28. Find the Index of the First Occurrence in a String
      • *186. Reverse Words in a String II
    • 栈与队列
      • 225. Implement Stack using Queues
      • 54. Spiral Matrix
      • 155. Min Stack
      • 232. Implement Queue using Stacks
      • 150. Evaluate Reverse Polish Notation
      • 224. Basic Calculator
      • 20. Valid Parentheses
      • 1472. Design Browser History
      • 1209. Remove All Adjacent Duplicates in String II
      • 1249. Minimum Remove to Make Valid Parentheses
      • *281. Zigzag Iterator
      • *1429. First Unique Number
      • *346. Moving Average from Data Stream
    • 优先队列/堆
      • 692. Top K Frequent Words
      • 347. Top K Frequent Elements
      • 973. K Closest Points
      • 23. Merge K Sorted Lists
      • 264. Ugly Number II
      • 378. Kth Smallest Element in a Sorted Matrix
      • 295. Find Median from Data Stream
      • 767. Reorganize String
      • 1438. Longest Continuous Subarray With Absolute Diff Less Than or Equal to Limit
      • 895. Maximum Frequency Stack
      • 1705. Maximum Number of Eaten Apples
      • *1086. High Five
    • 深度优先DFS
      • 二叉树
      • 543. Diameter of Binary Tree
      • 101. Symmetric Tree
      • 124. Binary Tree Maximum Path Sum
      • 226. Invert Binary Tree
      • 104. Maximum Depth of Binary Tree
      • 951. Flip Equivalent Binary Trees
      • 236. Lowest Common Ancestor of a Binary Tree
      • 987. Vertical Order Traversal of a Binary Tree
      • 572. Subtree of Another Tree
      • 863. All Nodes Distance K in Binary Tree
      • 1110. Delete Nodes And Return Forest
      • 230. Kth Smallest element in a BST
      • 98. Validate Binary Search Tree
      • 235. Lowest Common Ancestor of a Binary Search Tree
      • 669. Trim a Binary Search Tree
      • 700. Search in a Binary Search Tree
      • 108. Convert Sorted Array to Binary Search Tree
      • 450. Delete Node in a BST
      • 938. Range Sum of BST
      • *270. Closest Binary Search Tree Value
      • *333. Largest BST Subtree
      • *285. Inorder Successor in BST
      • *1485. Clone Binary Tree With Random Pointer
      • 回溯
      • 39. Combination Sum
      • 78. Subsets
      • 46. Permutation
      • 77. Combinations
      • 17. Letter Combinations of a Phone Number
      • 51. N-Queens
      • 93. Restore IP Addresses
      • 22. Generate Parentheses
      • 856. Score of Parentheses
      • 301. Remove Invalid Parentheses
      • 37. Sodoku Solver
      • 图DFS
      • 126. Word Ladder II
      • 212. Word Search II
      • 79. Word Search
      • 399. Evaluate Division
      • 1376. Time Needed to Inform All Employees
      • 131. Palindrome Partitioning
      • 491. Non-decreasing Subsequences
      • 698. Partition to K Equal Sum Subsets
      • 526. Beautiful Arrangement
      • 139. Word Break
      • 377. Combination Sum IV
      • 472. Concatenated Words
      • 403. Frog Jump
      • 329. Longest Increasing Path in a Matrix
      • 797. All Paths From Source to Target
      • 695. Max Area of Island
      • 341. Flatten Nested List Iterator
      • 394. Decode String
      • *291. Word Pattern II
      • *694. Number of Distinct Islands
      • *1274. Number of Ships in a Rectangle
      • *1087. Brace Expansion
    • 广度优先BFS
      • 102. Binary Tree Level Order Traversal
      • 103. Binary Tree Zigzag Level Order Traversal
      • 297. Serialize and Deserialize Binary Tree
      • 310. Minimum Height Trees
      • 127. Word Ladder
      • 934. Shortest Bridge
      • 200. Number of Islands
      • 133. Clone Graph
      • 130. Surrounded Regions
      • 752. Open the Lock
      • 815. Bus Routes
      • 1091. Shortest Path in Binary Matrix
      • 542. 01 Matrix
      • 1293. Shortest Path in a Grid with Obstacles Elimination
      • 417. Pacific Atlantic Water Flow
      • 207. Course Schedule
      • 210. Course Schedule II
      • 787. Cheapest Flights Within K Stops
      • 444. Sequence Reconstruction
      • 994. Rotting Oranges
      • 785. Is Graph Bipartite?
      • *366. Find Leaves of Binary Tree
      • *314. Binary Tree Vertical Order Traversal
      • *269. Alien Dictionary
      • *323. Connected Component in Undirected Graph
      • *490. The Maze
    • 动态规划
      • 70. Climbing Stairs
      • 72. Edit Distance
      • 377. Combination Sum IV
      • 1335. Minimum Difficulty of a Job Schedule
      • 97. Interleaving String
      • 472. Concatenated Words
      • 403. Frog Jump
      • 674. Longest Continuous Increasing Subsequence
      • 62. Unique Paths
      • 64. Minimum Path Sum
      • 368. Largest Divisible Subset
      • 300. Longest Increasing Subsequence
      • 354. Russian Doll Envelopes
      • 121. Best Time to Buy and Sell Stock
      • 132. Palindrome Partitioning II
      • 312. Burst Balloons
      • 1143. Longest Common Subsequence
      • 718. Maximum Length of Repeated Subarray
      • 174. Dungeon Game
      • 115. Distinct Subsequences
      • 91. Decode Ways
      • 639. Decode Ways II
      • 712. Minimum ASCII Delete Sum for Two Strings
      • 221. Maximal Square
      • 1277. Count Square Submatrices with All Ones
      • 198. House Robber
      • 213. House Robber II
      • 1235. Maximum Profit in Job Scheduling
      • 740. Delete and Earn
      • 87. Scramble String
      • 1140. Stone Game II
      • 322. Coin Change
      • 518. Coin Change II
      • 1048. Longest String Chain
      • 44. Wildcard Matching
      • 10. Regular Expression Matching
      • 32. Longest Valid Parentheses
      • 1043. Partition Array for Maximum Sum
      • *256. Paint House
      • 926. Flip String to Monotone Increasing
      • *1062. Longest Repeating Substring
      • *1216. Valid Palindrome III
    • 贪心
      • 56. Merge Intervals
      • 621. Task Scheduler
      • 135. Candy
      • 376. Wiggle Subsequence
      • 55. Jump Game
      • 134. Gas Station
      • 1005. Maximize Sum Of Array After K Negations
      • 406. Queue Reconstruction by Height
      • 452. Minimum Number of Arrows to Burst Balloons
      • 738. Monotone Increasing Digits
    • 单调栈
      • 739. Daily Temperatures
      • 503. Next Greater Element II
      • 901. Online Stock Span
      • 85. Maximum Rectangle
      • 84. Largest Rectangle in Histogram
      • 907. Sum of Subarray Minimums
      • 239. Sliding Window Maximum
    • 前缀和
      • 53. Maximum Subarray
      • 523. Continuous Subarray Sum
      • 304. Range Sum Query 2D - Immutable
      • 1423. Maximum Points You Can Obtain from Cards
      • 1031. Maximum Sum of Two Non-Overlapping Subarrays
    • 并查集
      • 684. Redundant Connection
      • 721. Accounts Merge
      • 547. Number of Provinces
      • 737. Sentence Similarity II
      • *305. Number of Islands II
    • 字典树trie
      • 208. Implement Trie
      • 211. Design Add and Search Words Data Structure
      • 1268. Search Suggestions System
      • *1166. Design File System
      • *642. Design Search Autocomplete System
    • 扫描线sweep line
      • 253. Meeting Room II
      • 1094. Car Pooling
      • 218. The Skyline Problem
      • *759. Employee Free Time
    • tree map
      • 729. My Calendar I
      • 981. Time Based Key-Value Store
      • 846. Hand of Straights
      • 480. Sliding Window Median
      • 318. Count of Smaller Numbers After Self
    • 数学类
      • 50. Pow(x, n)
      • *311. Sparse Matrix Multiplication
      • 382. Linked List Random Node
      • 398. Random Pick Index
      • 29. Divide Two Integers
    • 设计类
      • 1603. Design Parking System
      • 355. Design Twitter
      • 1396. Design Underground System
      • *359. Logger Rate Limiter
      • *353. Design Snake Game
      • *379. Design Phone Directory
      • *588. Design In-Memory File System
      • *1244. Design A Leaderboard
    • SQL
  • 机器学习
    • 数学基础
    • 评价指标
    • 线性回归
    • 逻辑回归
    • 树模型
    • 深度学习
    • 支持向量机
    • KNN
    • 无监督学习
    • k-means
    • 强化学习 RL
    • 自然语言处理 NLP
    • 大语言模型 LLM
    • 机器视觉 CV
    • 多模态 MM
    • 分布式机器学习
    • 推荐系统
    • 异常检测与风控
    • 模型解释性
    • 多任务学习
    • MLops
    • 特征工程
    • 在线学习
    • 硬件 cuda/triton
    • 产品case分析
    • 项目deep dive
    • 机器学习代码汇总
  • 系统设计
    • 面向对象设计
      • 电梯设计
      • 停车场设计
      • Unix文件系统设计
    • 系统设计
      • 设计社交网站Twitter
      • 设计视频网站Youtube
      • 短网址系统
      • 爬虫系统
      • 任务调度系统
      • 日志系统
      • 分布式缓存
      • 广告点击聚合系统
      • webhook
    • 机器学习系统设计
      • 推荐系统
      • 搜索引擎
      • Youtube视频推荐
      • Twitter推荐
      • 广告点击预测
      • 新闻推送推荐
      • POI推荐
      • Youtube视频搜索
      • 有害内容检测
      • 大模型RAG
      • 大模型Agent
      • 信贷风控
      • 朋友推荐
      • 去重复性/版权检测
      • 情感分析
      • 目标检测
      • 问答系统
      • 知识图谱问答
  • 行为面试
    • 领导力法则
    • 问答举例
  • 案例分享
    • 准备工作
    • 面试小抄
    • 面试之后
Powered by GitBook
On this page
  • 1. 最小经验损失函数
  • 对数损失函数 / Logistic loss function
  • 2. 最大似然法 Maximum likelihood estimation
  • 3. 最大熵法
  • 4. 最大后验法
  • 5. 优化
  • 5.1 在线学习 FTRL
  • 6. naive bayes 朴素贝叶斯
  • 7. 问答
  • 8. 代码
  • 参考
  1. 机器学习

逻辑回归

1. 最小经验损失函数

在二分类中,y_hat的含义是预测类别为1的概率为y_hat,相应的为0的概率为(1-y_hat)

最小化cross entropy等价于最大化log softmax probability

y=11+e−(wTx+b)y = \frac{1}{1+e^{-(w^{T} x + b)}}y=1+e−(wTx+b)1​

sigmoid function

g(z)=11+e−z    0<g(z)<1g(z) = \frac{1}{1+e^{-z}} \space\space\space\space 0 < g(z) <1g(z)=1+e−z1​    0<g(z)<1

sigmoid函数的导数为: sigmoid(x) * (1 - sigmoid(x))

对数损失函数 / Logistic loss function

Log loss is convex (注意负号)

L(f(x),y)={−log⁡(f(x))if y=1−log⁡(1−f(x))if y=0\mathcal{L}(f(x), y) = \begin{cases} -\log(f(x)) & \text{if } y = 1 \\ -\log(1 - f(x)) & \text{if } y = 0 \end{cases}L(f(x),y)={−log(f(x))−log(1−f(x))​if y=1if y=0​

Simplified loss function

L(f(x),y)=−[y⋅log⁡(f(x))+(1−y)⋅log⁡(1−f(x))]\mathcal{L}(f(x), y) = - \left[ y \cdot \log(f(x)) + (1 - y) \cdot \log(1 - f(x)) \right]L(f(x),y)=−[y⋅log(f(x))+(1−y)⋅log(1−f(x))]

Simplified cost function

J(w,b)=1m∑i=1mL(f(x(i)),y(i))=−1m∑i=1m[y(i)log⁡(f(x(i)))+(1−y(i))log⁡(1−f(x(i)))]\begin{align*} J(w, b) &= \frac{1}{m} \sum_{i=1}^{m} \mathcal{L}(f(x^{(i)}), y^{(i)}) \\ &= -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(f(x^{(i)})) + \left(1 - y^{(i)}\right) \log\left(1 - f(x^{(i)})\right) \right] \end{align*}J(w,b)​=m1​i=1∑m​L(f(x(i)),y(i))=−m1​i=1∑m​[y(i)log(f(x(i)))+(1−y(i))log(1−f(x(i)))]​

Codes

# log loss
log_loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# cross entropy
cross_entropy = -np.mean(np.sum(y_true * np.log(y_pred), axis=1))

softmax loss

  • 多分类损失函数

  • PyTorch: log_softmax + NLLLoss = CrossEntropyLoss

KL散度 (Kullback-Leibler Divergence)

  • 衡量两个分布的差异

  • 交叉熵就是 KL 散度加信息熵,而信息熵是一个常数

DKL(p∣∣q)=∑x∈Xp(x)log⁡p(x)q(x)=−∑x∈Xp(x)[log⁡q(x)−log⁡p(x)]D_{KL}(p||q)=\sum_{x\in X} p(x)\log\frac{p(x)}{q(x)}=-\sum_{x\in X} p(x)[\log q(x) - \log p(x)]DKL​(p∣∣q)=x∈X∑​p(x)logq(x)p(x)​=−x∈X∑​p(x)[logq(x)−logp(x)]
  1. 计算损失函数对y_hat的微分 (损失函数的grad)

  2. 计算y_hat对sigmoid函数的微分 (传播到激活函数)

  3. 计算sigmoid对权重w的微分 (传播到层)

2. 最大似然法 Maximum likelihood estimation

最大似然与最小交叉熵损失函数等价,可以从最大似然推导出逻辑回归使用的交叉熵损失函数。

L(w)=∏[p(xi)]yi[1−p(xi)]1−yiL(w)=\prod[p(x_{i})]^{y_{i}}[1-p(x_{i})]^{1-y_{i}}L(w)=∏[p(xi​)]yi​[1−p(xi​)]1−yi​
L(w)=∏i=1mpy⋅(1−p)1−yL(w)=\prod_{i=1}^{m}{p^{y}\cdot(1-p)^{1-y}}L(w)=i=1∏m​py⋅(1−p)1−y

3. 最大熵法

最大熵原理:对于概率模型,在所有可能分布的概率模型中,熵最大的模型是最好的模型。熵用来形容一个系统的无序程度

H(X)=−∑x∈Xp(xi)log⁡p(xi)H(X) = -\sum_{x \in X}p(x_i) \log p(x_i)H(X)=−x∈X∑​p(xi​)logp(xi​)

4. 最大后验法

5. 优化

5.1 在线学习 FTRL

  • 论文: https://dl.acm.org/doi/pdf/10.1145/2487575.2488200

  • 问题: Lasso引入L1正则项使模型的训练结果具有稀疏性,稀疏不仅有变量选择的功能,同时大大减少线上预测运算量。在线学习场景,利用SGD进行权重参数(W)的更新,每次只使用一个样本,权重参数的更新具有很大的随机性,无法将权重参数准确地更新为0

  • 用ftrl优化可以得到稀疏权重,从而降低serving的复杂度

6. naive bayes 朴素贝叶斯

  • 朴素贝叶斯的条件概率 P(X|Y=c) 服从高斯分布时,它计算出来的 P(Y=1|X) 形式跟逻辑回归一样

  • Pros

    • Real time predictions: It is very fast and can be used in real time

    • Scalable with Large datasets

    • Insensitive to irrelevant features

    • Multi class prediction is effectively done in Naive Bayes

    • Good performance with high dimensional data(no. of features is large)

  • Cons

    • Independence of features does not hold

    • Bad estimator: Probability outputs from predict_proba are not to be taken too seriously

    • Training data should represent population well

7. 问答

  • pros:

    • interpretable and explainable method

    • less prone to overfitting when using regulation

    • applicable for multi-class predictions

    • Robustness to Feature Noise

  • cons:

    • assuming linearity between input and output (Linear Separability)

    • can overfit with small, high dimensional data

    • High reliance on proper presentation of data

    • Multicollinearity impact on interpretation

  • 正负样本不平衡

  • how do you deal with a categorical variable with high cardinality

  • 并行

    • 逻辑回归并行化主要是对目标函数梯度计算的并行化。逻辑回归可以认为是极简的MLP。

  • 为什么逻辑回归不用mse做损失函数

    • follow up: 为什么GBDT做分类也用回归树呢?损失函数的选择

  • Logistic regression和naive bayes的区别

    • LR数据较多时优于NB。NB假设前提是个体独立,无法处理词组

  • Logistic regression和SVM的区别

    • 任务不同: 分类,SVM可以解决回归问题

    • loss不同: BCE与hinge loss

    • 输出不同: LR概率输出,SVM score

  • LR中连续特征为什么要做离散化

    • 数据角度:离散化的特征对异常数据有很强的鲁棒性;离散化特征利于进行特征交叉。

    • 模型角度:当数据增加/减少时,利于模型快速迭代;离散化相当于为模型引入非线性表达;离散化特征简化了模型输入,降低过拟合风险;LR中离散化特征很容易根据权重找出bad case。

    • 计算角度:稀疏向量内积计算速度快。(在计算稀疏矩阵内积时,可以根据当前值是否为0来直接输出0值,这相对于乘法计算是快很多的。)

  • 什么是最大似然估计,其假设是什么?

  • 一个分类器,有的 token 在 vocabulary 里面没出现导致概率是0怎么办

    • softmax,概率取对数再求指数

  • Multiclass versus multilabel classification

    • multilabel: 每个label都当作一个二分类任务, BCE loss

8. 代码

import numpy as np
from scipy.stats import norm
np.random.seed(1)

class LogisticRegression(object):
    def __init__(self):
        self.w = None
        self.b = None
        self.epsilon = 1e-8
        self.activate=self.sigmoid

    def fit(self, x, y):
        batch_size, feature_size = x.shape
        self.w = np.random.random((feature_size, 1))
        self.b = np.random.random(1)
        self._gradient_descent(x, y)

    def predict(self, x):
        prob = self.activate(np.dot(x, self.w) + self.b)
        return prob

    def _gradient_descent(self, x, y, learning_rate=10e-4, epoch=100):
        for i in range(epoch):
            y_pred = self.activate(np.dot(x, self.w) + self.b)
            loss = np.mean(-y * np.log(y_pred + self.epsilon) - (1 - y) * np.log(1 - y_pred + self.epsilon))
            w_gradient = np.dot(x.T, (y_pred - y))
            b_gradient = np.mean(y_pred - y)
            self.w = self.w - learning_rate * w_gradient
            self.b = self.b - learning_rate * b_gradient
            print('step:', i, 'Loss:', loss)

    def sigmoid(self, input):
        return 1 / (1 + np.exp(-input))

    def __str__(self):
        return 'weights\t:%s\n bias\t:%f\n' % (self.w, self.b)

def generate_data():
    x0 = np.hstack((norm.rvs(2, size=40, scale=2), norm.rvs(8, size=40, scale=3)))
    x1 = np.hstack((norm.rvs(4, size=40, scale=2), norm.rvs(5, size=40, scale=2)))
    x = np.transpose(np.vstack((x0, x1)))
    y = np.vstack((np.zeros((40, 1)), np.ones((40, 1))))
    return x,y

if __name__ == "__main__":
    x,y = generate_data()
    lr = LogisticRegression()
    lr.fit(x,y)
    y_hat = lr.predict(x)

参考

Previous线性回归Next树模型

Last updated 11 days ago

逻辑回归中,形式上看起来与线性回归类似:

假设样本服从

与相同

损失函数对模型参数的梯度计算结果
伯努利分布/二项分布 Bernoulli distribution
线性回归
在线最优化求解(Online Optimization)-冯扬
欠采样(undersampling)和过采样(oversampling)会对模型带来怎样的影响?
本质是分布不同,最大似然可以看出,参数正态推导出MSE, 参数二项推导出cross entropy;公式推导发现容易导致梯度消失,很大的话梯度有一项会趋于0;陷入局部最优
用平方差后 损失函数是非凸的(non-convex),很难找到全局最优解
【机器学习】逻辑回归
Cross-entropy
逻辑回归为什么用Sigmoid? - 雨林丶的文章 - 知乎
softmax,sigmoid函数在使用上的区别是什么? - 初识CV的回答 - 知乎
softmax derivative