支持向量机

间隔最大化来得到最优分离超平面。方法是将这个问题形式化为一个凸二次规划问题,还可以等价位一个正则化的合页损失最小化问题

1. 优化目标与间隔

y^=sign(wTx+b)\begin{align*} \hat y=sign(w^T\cdot x+b) \end{align*}

目标: 支持向量离超平面的距离最远,其中支持向量:分类正确且距离边界最小的样本

minw,b12w2s.t. yi(wTxi+b)1\begin{align*} &\min \limits_{w,b}\frac{1}{2}\left \|w \right\|^2 \\ &\text{s.t. } y_i(w^Tx_i+b)\geq1 \\ \end{align*}

hinge loss在训练后期天然偏重困难样本的损失,对于深度学习也有启发意义

KKT条件

2. 对偶

  • 当目标函数与约束函数法向量平行的时候,得到带约束目标函数的最小值。使用拉格朗日乘子法将带约束的目标函数求解问题转化成无约束条件的目标函数求解问题

L(w,b,α)=12w2+i=1mαi(1yi(wTxi+b))\begin{align*} &L(w,b,\alpha)=\frac{1}{2}\left \|w \right\|^2+\sum_{i=1}^{m} \alpha_i(1-y_i(w^Tx_i+b)) \end{align*}

3. 核技巧

  • 内积:两个样本之间关系的度量

  • 多项式核函数:线性回归交叉特征

  • 高斯核函数: 映射到无限维空间,直观理解是设置几个landmark,然后计算每个点到这个landmark的距离。这里的距离是把landmark拉宽成一个高斯概率密度,然后计算RBF Kernel 是所有多项式核函数的线性组合

4. 软间隔

引入松弛变量(slack variable),松弛变量允许一部分变量越界,但是要付出代价的,新增一个超参数来决定惩罚力度,这里的惩罚和一般的损失函数一样,分类正确的惩罚为0,分类错误的开始有惩罚

5. 问答

  • what's the difference between logistic regression and SVM

    • loss type, logistic loss for LR, hinge loss for SVM

    • LR is parametric model (Bernoulli distribution), SVM with RBF kernel is non-parametric model

    • For SVM, only support vectors will influence the model, and every sample will influence LR model

    • SVM with structural risk minimization with L2 norm naturally, LR use experimential risk minimization

    • SVM normally use kernel function to solve the unlinear problem, LR not

  • pros

    • Performs well in Higher dimension

    • Best algorithm when classes are separable

    • Outliers have less impact

    • SVM is suited for extreme case binary classification

  • cons

    • Slow: For larger dataset, it requires a large amount of time to process.

    • Poor performance with Overlapped classes : Does not perform well in case of overlapped classes.

    • Selecting appropriate hyperparameters is important

    • Selecting the appropriate kernel function can be tricky.

6. 代码

参考

Last updated