机器学习代码汇总

1. 目标与评价

损失函数

import numpy as np

class MSE:
    def loss(self, y_true, y_pred):
        return 0.5 * np.power(y_true - y_pred, 2)

    def gradient(self, y_true, y_pred):
        # 损失函数对 y_pred (网络最后一层输出)的梯度
        return -1 * (y_true - y_pred)


class CrossEntropyLoss:
    """
    pytorch中labels的格式:nn.CrossEntropyLoss()是一维的类别, bce是one hot的多维
    ln(x)的导数 1/x,exp(x)的导数exp(x)
    """
    def loss(self, labels, logits, epsilon=1e-12):
        """
        labels = np.array([[0, 1], [1, 0]])
        logits = np.array([[0.1, 0.9], [0.8, 0.2]])
        """
        logits = np.clip(logits, epsilon, 1. - epsilon)
        return -np.mean(np.sum(labels * np.log(logits), axis=1))

    def gradient(self, labels, logits, epsilon=1e-12):
        logits = np.clip(logits, epsilon, 1. - epsilon)
        return -labels / logits


class CrossEntropyLoss2:
    def loss(self, labels, logits, epsilon=1e-12):
        """ 类似线形回归的shape
        labels = np.array([1, 0])
        logits = np.array([0.9, 0.2])
        """
        logits = np.clip(logits, epsilon, 1. - epsilon)
        return np.mean(- labels * np.log(logits) - (1 - labels) * np.log(1 - logits))

    def gradient(self, labels, logits, epsilon=1e-12):
        logits = np.clip(logits, epsilon, 1. - epsilon)
        return - (labels / logits) + (1 - labels) / (1 - logits)
  • focal loss

指标

  • AUC

cross validation

2. 统计学习模型

线形回归: native

线性回归: numpy

逻辑回归: native

逻辑回归: numpy

决策树分类

决策树回归

Xgboost回归

Xgboost分类

K-means

KNN

PCA

3. 深度学习模型

MLP-numpy

MLP-torch

CNN-numpy

  • option1: native

  • option2: 转化为一个大矩阵运算, 加快训练速度

CNN-torch

LSTM-numpy

LSTM-torch

Attention-numpy

Attention-torch

Dropout

BatchNorm

Activation

4. 领域-NLP

n-gram

tfidf

geeksforgeeks

word2vec

Bayes文本分类器

kv-cache

bert-summary

tokenizer: BPE贪心

positional encoding

beam search

top_k LLM token decoding

prefix cache

5. 领域-CV

6. pipeline

XGBoost

torch

PySpark

7. 特征工程

前处理-转换

数量特征-pandas

数量特征-SQL

类别特征-pandas

类别特征-SQL

Reference

Last updated