去重复性/版权检测

1. requirements

  • 机器学习系统中处理相似商品推荐时,去重复(deduplication)可以确保用户获得多样化且相关的商品选项

  • user上传的视频是否和a large media collection里的视频有重复

2. ML task & pipeline

  • 基本属性来检测重复(ID, 相似度)

  • 局部敏感哈希, video做hashing,用bloom filter

  • 做embedding,放vector database,找nearest neighbor

3. data collection

4. feature

  • NN

5. model

6. evaluation

7. deploy & serving

8. monitoring & maintenance

Reference

Last updated