1. Responsible for model deployment solution design and inference performance optimization 2. Analyze model bottlenecks to enable the model to run continuously and efficiently 3. Write relevant code, iteratively improve the model, and track the application effect of new ideas 4. Pay attention to the research progress of new technologies and new applications, and actively participate in the development and application of related fields.
Job Requirements:
1. Familiar with Megatron, deepspeed, vllm and other training or inference acceleration frameworks, and those with multi-machine and multi-card training experience are preferred 2. Familiar with CUDA programming/Triton programming, and those with relevant operator acceleration experience are preferred 3. Those with model quantization acceleration and model compression experience are preferred 4. For large companies or large model startups, those with large model deployment solution design experience are preferred, and more than 4 years of experience in model inference optimization.
Warm Tips:
You can only submit your resume once. Please select the position you are interested in. After the application is successful, your choice will be used as a reference for the allocation department.