Responsibilities
Team introduction: AML is ByteDance's machine learning platform, which provides recommendation/advertising/CV/voice/NLP training and inference systems for Douyin/Toutiao/Xigua Video and other businesses. Provide powerful machine learning computing power to the company's business departments, and research some general and innovative algorithms on these business issues. At the same time, some core capabilities of machine learning/recommendation systems are also provided to external enterprise customers through the Volcano Engine. In addition, AML is also doing some cutting-edge research in fields such as AI for Science and scientific computing. 1. Responsible for the research and development of ByteDance's machine learning training/inference framework, serving various products across the company 2. Participate in the abstraction, design, optimization and implementation of the underlying components of the machine learning training/inference framework 3. In-depth cooperation with the company's algorithm department to jointly optimize algorithms and systems for key projects.
Qualifications
1. Proficient in C/C++ and Python languages under the Linux environment 2. Exposed to at least one machine learning framework (Tensorflow/PyTorch/MxNet or other self-developed frameworks) 3. Have background knowledge and experience in at least one of the following: GPU programming, compilers, high-performance networks, HPC 4. Have the ability to solve problems independently and have a good team spirit 5. Have a strong sense of work responsibility, good learning ability, communication skills and self-motivation 6. Have good work documentation habits, and write and update work processes and technical documents as required in a timely manner. Bonus points: 1. In-depth study of the underlying architecture and mechanism of at least one machine learning framework (Tensorflow/PyTorch/MxNet or other self-developed frameworks) 2. Familiar with at least one classic deep learning model and its application scenarios, such as ResNet50, BERT, or understanding of GAN, reinforcement learning, graph neural network, AutoML, etc. 3. Have a research background in the direction of computer systems (including distributed systems, parallel computing, programming languages and compilers, networks, storage, etc.) at the master's or doctoral level 4. Have experience in joint design of software and hardware 5. Be able to use mathematical tools to analyze optimization algorithms in deep learning training.