Responsibilities
1. Participate in the design and development of AI chip multi-card interconnection solutions, and formulate an efficient and stable multi-card interconnection architecture according to the company's product needs 2. Participate in the development and optimization of multi-card interconnection software drivers to achieve collaborative work between multiple cards and improve the overall performance and stability of the system 3. Participate in the verification and debugging of the multi-card interconnection system, timely discover and solve various problems that arise during the test process 4. Participate in the design of the AI chip profiling solution, and be responsible for the verification of the profiling module and the development of related drivers and tools 5. Participate in the design and development of the AI chip task scheduler driver solution, and be responsible for the development of task scheduling firmware 6. Track the latest technology trends in the industry, and provide forward-looking suggestions and technical reserves for the company's AI chip multi-card interconnection technology development.
Qualifications
1. Bachelor degree or above in computer/automation related majors 2. Proficient in at least one programming language, such as C, C++, etc., and proficient in Linux system programming 3. Have experience in Linux kernel driver development, and be proficient in the driver development process under the Linux operating system 4. At least 2 years of working experience in AI chips and GPGPU chips or related fields 5. Familiar with the architecture and working principles of AI chips or GPGPU, and have an in-depth understanding of multi-card interconnection technologies, including but not limited to high-speed interconnection interface protocols such as PCIe, NVLink, RoCEV2 familiar with CUDA Runtime/Driver API/NCCL/CUPTI, etc., and familiar with the CUDA Toolkit software stack 6. Have good teamwork spirit and communication skills, and be able to work closely with cross-department teams (such as hardware teams, algorithm teams, etc.) to jointly promote project progress have strong problem-solving and learning abilities, be able to quickly solve technical problems encountered at work, and continuously learn and master new technologies and knowledge. Bonus points: 1. Understand the LLM network model structure and be familiar with model deployment, analysis and optimization 2. Good at cross-team communication and collaboration, with experience in project or team management.