Responsibilities
1. Participate in the design and implementation of machine learning and deep learning architecture in Douyin & international short video search scenarios 2. Pay attention to the performance and cost optimization of deep learning models such as text correlation models, multi-modal, Wide&Deep models, and LLM models in search scenarios 3. Improve model training efficiency, accelerate inference performance, and optimize GPU throughput 4. Design and develop high-performance operator libraries, research the latest GPU computing technology and optimization methods, make full use of GPU parallel computing capabilities, and support the construction and deployment of various models on different hardware.
Qualifications
1. Master solid basic computer knowledge, and have an in-depth understanding of data structures, algorithms and operating system knowledge 2. Have a strong thirst for knowledge, quick learning and communication skills 3. Solid programming skills and good programming habits. Bonus points: 1. Familiar with mainstream deep learning frameworks (Pytorch, Tensorflow) 2. Familiar with mainstream high-performance parallel programming technologies, with experience in GPU programming (CUDA or Triton) 3. Familiar with large model reasoning frameworks such as vLLM, TRT-LLM, and the principles of large model concurrent reasoning 4. Familiar with the use and principles of mainstream distribution frameworks in the industry such as FSDP, DeepSpeed, and Megatron.