Youtube视频搜索

搜索的核心是relevance

1. requirements

产品&场景

  • What are the specific use cases and scenarios where it will be applied? general search or vertical domain search

  • Do we need consider Personalization? not required

  • the video + the video title/description

目标

  • What is the primary (business) objective of the search system?

约束

  • Is their any data available? What format?

  • What are the system requirements (such as response time, accuracy, scalability, and integration with existing systems or platforms)?

  • What is the expected scale of the system in terms of data and user interactions?

  • How many languages needs to be supported?

2. ML task & pipeline

任务:利用历史交互来推荐用户可能交互的item 顶层设计:query转化为embedding, video可以转化为整体embedding 或分模态的多个embedding,根据对比学习进行为微调,推理时取最近临

3. data collection

Query
Video

Query 1

Video 1

Query 2

Video 2

4. feature

5. model

text

video

loss

6. evaluation

  • Offline

    • Precision@k, mAP, Recall@k, MRR

    • we choose MRR (avg rank of first relevant element in results) due to the format of our eval data <video, text> pair

  • Online(A/B test)

    • CTR: problem: doesn't track relevancy, click baits

    • video completion rate: partially watched videos might still found relevant by user

    • total watch time

    • we choose total watch time: good indicator of relevance

7. deployment & prediction service

  • AB testing

  • Scaling

8. monitoring & maintenance

9. 优化与问答

reference

精读

扩展

Last updated