有害内容检测

harmful content/weapon ads, copyright检测

内容风控: 多模态场景, evolve rapidly

1. requirements

场景/功能

  • product

  • item

    • What types of harmful content are we aiming to detect? (e.g., hate speech, explicit images, cyberbullying)?

    • What are the potential sources of harmful content? (e.g., social media, user-generated content), detect bad actors?

    • Are there specific legal or ethical considerations for content moderation

目标

  • more accuracy or recall

约束

  • What is the expected volume of content to be analyzed daily?

  • Are there human annotators available for labeling?

  • Is there a feature for users to report harmful content? (click, text, etc).

  • Is explainability important here?

  • 是否有必要检测出harmful的具体类型? (violence, nudity, self-harm, hate speech) -> multi-task

  • latency requirement

2. ML task & pipeline

  • 先介绍几个大的方向, pros, cons

  • Multimodal input (text, image, video, speech, etc)

  • Multi-Label/Multi-Task classification

3. data collection

本题中,如何收集数据和label是比较关键的。可以先确认有没有annotated data,没有的话则先根据用户反馈(silver labels),再根据人工审核(golden labels). 或者利用大模型进行合成或判断

  • 少量标注数据,大量未标注数据

  • platform: 人工审核

  • user feedback: report,dislike,comment, surfaced by user complaints

  • interaction: anomaly

4. feature

  • item

    • text

    • image/video

    • audio

  • user

    • author

  • interaction

    • engagement

    • report

  • context

5. model

  • 多模态(few-shot, 或微调)

  • 大模型

  • 多任务

    • 可以提升任务之间的信息共享,二是在线预估时相比多个单任务模型也能节省性能

6. evaluation

  • offline

    • F1 score, PR-AUC, ROC-AUC

    • 为什么我们选择auc-pr而不是auc-roc。因为auc-pr可以更好的处理imbalance data, harmful content本身就是少数

  • online

    • AB test evaluated by online metric.

    • prevalence (percentage of harmful posts didn't prevent over all posts), harmful impressions, percentage of valid (reversed) appeals, proactive rate (ratio of system detected over system + user detected)

    • 很多场景,不是知道所有错误信息的(spam/fraud detection)。比较简单的方法,就是不做true recall. sample一些data做个partial recall, 还可以看自己业务设置其他类似recall的matrix

7. deployment & prediction service

  • 有些环节offline, 有些online, serving就是online部分

  • Harmful content detection service

  • Demoting service (prob of harm with low confidence)

  • violation service (prob of harm with high confidence)

8. monitoring & maintenance

  • 怎么部署这个系统,后续如何维护

  • How to roll out new model? AB test: 10% as holdout, others as factorial experiment design. 同一个factor内的level互斥, 不同factor之间正交. significant improvement 就可以推全, 留一小部分做reverse AB.

Reference

Last updated