有害内容检测

harmful content/weapon ads, copyright检测

内容风控: 多模态场景, evolve rapidly

1. requirements

场景/功能

product
item
- What types of harmful content are we aiming to detect? (e.g., hate speech, explicit images, cyberbullying)?
- What are the potential sources of harmful content? (e.g., social media, user-generated content), detect bad actors?
- Are there specific legal or ethical considerations for content moderation

目标

more accuracy or recall

约束

What is the expected volume of content to be analyzed daily?
Are there human annotators available for labeling?
Is there a feature for users to report harmful content? (click, text, etc).
Is explainability important here?
是否有必要检测出harmful的具体类型? (violence, nudity, self-harm, hate speech) -> multi-task
latency requirement

2. ML task & pipeline

先介绍几个大的方向, pros, cons
Multimodal input (text, image, video, speech, etc)
Multi-Label/Multi-Task classification

3. data collection

如何收集数据和label是比较关键的。先确认有没有annotated data，没有的话则先根据用户反馈(silver labels)，再根据人工审核(golden labels). 或者利用大模型进行合成或判断

少量标注数据，大量未标注数据
platform: 人工审核
user feedback: report，dislike，comment, surfaced by user complaints
interaction: anomaly

4. feature

item
- text
- image/video
- audio
user
- author
interaction
- engagement
- report
context

5. model

多模态(few-shot, 或微调)
大模型
多任务
- 可以提升任务之间的信息共享，二是在线预估时相比多个单任务模型也能节省性能

6. evaluation

offline

F1 score, PR-AUC, ROC-AUC
为什么我们选择auc-pr而不是auc-roc。因为auc-pr可以更好的处理imbalance data, harmful content本身就是少数

online

AB test evaluated by online metric.
prevalence (percentage of harmful posts didn't prevent over all posts), harmful impressions, percentage of valid (reversed) appeals, proactive rate (ratio of system detected over system + user detected)
很多场景，不是知道所有错误信息的(spam/fraud detection)。比较简单的方法，就是不做true recall. sample一些data做个partial recall, 还可以看自己业务设置其他类似recall的matrix

7. deployment & prediction service

有些环节offline, 有些online, serving就是online部分
Harmful content detection service
Demoting service (prob of harm with low confidence)
violation service (prob of harm with high confidence)

8. monitoring & maintenance

怎么部署这个系统，后续如何维护
How to roll out new model? AB test: 10% as holdout, others as factorial experiment design. 同一个factor内的level互斥, 不同factor之间正交. significant improvement 就可以推全, 留一小部分做reverse AB.

Reference

PreviousYoutube视频搜索 Next大模型RAG

Last updated 21 days ago