Responsibilities
1. Responsible for clipping the preference alignment algorithm of CapCut's multi-modal model, using SFT/RLHF/POST Training and other technologies to conduct domain knowledge modeling for video creation 2. Improving the instruction compliance capabilities of the enhanced model and security capabilities of the video creation Agent's large model, and improving Pre-trained Model's ability in video creation builds an intelligent agent that is an industry-leading video creation expert.
Qualifications
1. Master degree or above in mathematics, computer, control science, software engineering, artificial intelligence and other related disciplines 2. Familiar with the basic knowledge of large models, and have basic knowledge of training or reasoning related to large language models 3. Familiar with LLM training or Fine-tuning methods, such as SFT/RLHF experience, or familiar with reinforcement learning (RL) concepts and in-depth understanding of PPO-related algorithm knowledge 4. Solid Python or C++ programming skills, understanding of large model training and inference technology stacks such as PyTorch, Tensorflow, Deepspeed, Megatron, vLLM, etc. 5. Passionate about technology, paying attention to new research and papers on large models, and interested in the application of large models 6. Have the ability to solve problems independently, have good technical communication and collaboration skills, and be willing to promote the solution of problems in the project. Bonus points: 1. Participated in RLHF work related to large model projects 2. In-depth understanding of alignment algorithm engineering practices, able to optimize the training efficiency of RLHF corresponding frameworks 3. Understand the alignment of multi-modal large models, and understand the relevant content of step-supervised learning.