机器学习代码汇总
1. 目标与评价
损失函数
import numpy as np
class MSE:
def loss(self, y_true, y_pred):
return 0.5 * np.power(y_true - y_pred, 2)
def gradient(self, y_true, y_pred):
# 损失函数对 y_pred (网络最后一层输出)的梯度
return -1 * (y_true - y_pred)
class CrossEntropyLoss:
"""
pytorch中labels的格式:nn.CrossEntropyLoss()是一维的类别, bce是one hot的多维
ln(x)的导数 1/x,exp(x)的导数exp(x)
"""
def loss(self, labels, logits, epsilon=1e-12):
"""
labels = np.array([[0, 1], [1, 0]])
logits = np.array([[0.1, 0.9], [0.8, 0.2]])
"""
logits = np.clip(logits, epsilon, 1. - epsilon)
return -np.mean(np.sum(labels * np.log(logits), axis=1))
def gradient(self, labels, logits, epsilon=1e-12):
logits = np.clip(logits, epsilon, 1. - epsilon)
return -labels / logits
class CrossEntropyLoss2:
def loss(self, labels, logits, epsilon=1e-12):
""" 类似线形回归的shape
labels = np.array([1, 0])
logits = np.array([0.9, 0.2])
"""
logits = np.clip(logits, epsilon, 1. - epsilon)
return np.mean(- labels * np.log(logits) - (1 - labels) * np.log(1 - logits))
def gradient(self, labels, logits, epsilon=1e-12):
logits = np.clip(logits, epsilon, 1. - epsilon)
return - (labels / logits) + (1 - labels) / (1 - logits)focal loss
指标
AUC
cross validation
2. 统计学习模型
线形回归: native
线性回归: numpy
逻辑回归: native
逻辑回归: numpy
决策树分类
决策树回归
Xgboost回归
Xgboost分类
K-means
KNN
PCA
3. 深度学习模型
MLP-numpy
MLP-torch
CNN-numpy
option1: native
option2: 转化为一个大矩阵运算, 加快训练速度
CNN-torch
LSTM-numpy
LSTM-torch
Attention-numpy
Attention-torch
Dropout
BatchNorm
Activation
4. 领域-NLP
n-gram
tfidf
word2vec
Bayes文本分类器
kv-cache
bert-summary
tokenizer: BPE贪心
positional encoding
beam search
top_k LLM token decoding
prefix cache
5. 领域-CV
6. pipeline
XGBoost
torch
PySpark
7. 特征工程
前处理-转换
数量特征-pandas
数量特征-SQL
类别特征-pandas
类别特征-SQL
Reference
Last updated