Responsibilities
1. Responsible for the model construction of multi-modal content understanding, content identification and content mining for live broadcast business, to improve the content supply and growth on the live broadcast side 2. Responsible for the optimization and iteration of computer vision, audio, and text large models in live broadcast scenarios, including live broadcast room screen recognition and detection, text semantic understanding and summary, large model intelligent assistant, etc. 3. Explore cutting-edge technologies such as computer vision, multi-modality, and LLM, and be responsible for the design, development, and optimization of algorithm models.
Qualifications
1. Have in-depth research in a certain field of computer vision, NLP, multi-modality, and deep learning, including but not limited to: image and video understanding, detection, segmentation, action recognition, multi-modality, RAG, few-shot learning, etc. 2. Be familiar with the training and deployment of one or more framework models in PyTorch/TensorFlow, and understand mixed-precision training, distributed training, TensorRT deployment, etc. 3. Applicants with strong model development and tuning capabilities, project experience in video content understanding or multi-modal retrieval will be given priority, and winners in Kaggle, COCO, ActivityNet, ICPC, NOI/IOI and other competitions will be given priority 4. Excellent understanding, communication and teamwork skills, proactive and enthusiastic.