特征工程
Using intuition to design new features, 基于业务理解 + 基于常用套路 + 基于EDA
我个人喜欢基于类似OOD设计的ER图进行思考
numerical feature
categorical feature
sparse feature
dense feature
time feature
sequence feature
graph feature
特征生成
类型角度: 比如数值类和类别类
domain角度: 比如推荐常用的用户,物品,context, 交叉特征
特征选择
correlation filtering
Feature store
数据经过feature engineering pipeline之后进入feature store,feature store负责特征的存储与serving
Ingestion: Batch (ETL, Spark) and Stream (Flink, Kafka)
Store: Offline (parquet, BigQuery) and Online (Redis)
Service: offline and online access

问答
high cardinality
feature backfill: backfill offline first → experiment → promote only if useful.
feature lifestyle: draft → backfill → validate → promote → deprecate
参考
精读
扩展
代码
Last updated