AI training performance optimization engineer- ByteDance Research

0-2 years
9 days ago
Job Description

Responsibilities

ByteDance Research focuses on cutting-edge technology research in the field of artificial intelligence, covering multiple technical research fields such as natural language processing, computer vision, machine learning, and reinforcement learning. It is also committed to implementing research results to provide support for the company's existing products and The business provides technical support and services. 1. Provide training stability, ease of use, performance and sacle up optimization for LLM and Diffusion Model 2. Be able to use profile means to analyze training bottlenecks, and use distributed strategy tuning, operator optimization and other means to improve training performance 3. Responsible for the research and introduction of ByteDance Research training optimization technology 4. In-depth cooperation with the algorithm department to carry out joint optimization of algorithms and systems.

Qualifications

1. Bachelor degree or above, computer/electronics/automation/software and other related majors, those with experience in AI engineering optimization are preferred 2 , Familiar with LLM, Diffusion Model training performance optimization in any scenario 3. Familiar with the use and principles of mainstream distribution frameworks in the industry such as Pytorch, FSDP, Deepspeed, Megatron, etc., able to optimize business scenarios, and able to follow the latest trends in the industry and implement them 4. Be proficient in GPU high-performance computing optimization technology, have rich experience in CUDA-based GPU performance optimization, have an in-depth understanding of computer architecture, and be familiar with parallel computing optimization, memory access optimization, low-bit computing, etc. 5. Understand the basic principles of deep learning algorithms , be familiar with the basic architecture of neural networks and calculation methods of each operator, and understand the analysis of at least one deep learning training framework and its model files.

JOB TYPE

Function

Skills

Gpu
Llm
parallel computing optimization
Diffusion Model
memory access optimization
deep learning algorithms
Deepspeed
low-bit computing
FSDP
Megatron
About
Job Source: jobs.bytedance.com

ByteDance is a technology company operating a range of content platforms that inform, educate, entertain and inspire people across languages, cultures, and geographies.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.

People Also Considered

Career Advice to Find Better