WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiencesfrom AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challengesstriving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.
Together, we advance your career.
Key Responsibilities
Customer Engagement & Technical Debug Support
- Serve as the primary technical interface for customers on GPU server bringup, stability, and debug issues.
- Support customers during system integration, validation, and production ramp, acting as the first line of escalation.
PreSales & PostSales Support
- Support POC, EVT/DVT/PVT, and early customer deployments from a system debug perspective.
- Review customer system architecture and provide debug readiness, risk assessment, and bestpractice guidance.
Server BringUp & Issue Debugging - EPYC
- Diagnose and resolve serverlevel issues including boot failures, OS bringup, GPU/NIC detection, PCIe issues, and system hangs.
- Perform HW/SW codebug across BIOS/UEFI, BMC, firmware, drivers, OS, and GPU stacks.
- Analyze logs, dumps, and traces (BIOS, BMC, OS, GPU, NIC) to isolate root causes.
- Work closely with ODMs, component vendors, and internal engineering teams to drive issue closure.
GPU & Platform Debug - Instinct/Pensando
- Debug GPU server issues related to power, thermals, PCIe, interconnects, and multiGPU configurations.
- Validate GPU functionality under stress, burnin, and longrun stability conditions.
- Support RMA analysis and failure reproduction when required.
Performance Validation & Stability
- Assist with systemlevel performance validation and identify platform bottlenecks.
- Support customer concerns related to system stability, reliability, and scalability in multiGPU servers.
Documentation & Knowledge Sharing
- Create debug guides, checklists, and bestpractice documents for server bringup and issue triage.
- Provide technical training to customers and internal teams on server debug methodology and tools
Qualifications
- Bachelor's or Master's degree in related field.
- 5+ years of experience in server platform debug, GPU systems, or data center hardware support.
- Strong understanding of x86 server architecture, GPU platforms, PCIe, memory, power, and thermals.
- Handson experience with Linux OS, system logs, firmware, and driverlevel debugging.
- Experience working with ODMs/OEMs and crossfunctional engineering teams.
- Strong communication skills for customerfacing debug and escalation management.
Preferred Skills
- Experience debugging GPU servers or AI/HPC platforms in customer environments.
- Familiarity with BIOS/UEFI, BMC (OpenBMC), firmware update flows, and server validation stages.
- Understanding of networking (NICs, RDMA, Ethernet/InfiniBand) in GPU servers.
- Ability to work independently, manage multiple customer issues, and drive problems to closure.
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants needs under the respective laws throughout all stages of the recruitment and selection process.
AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's Responsible AI Policy is available here.
This posting is for an existing vacancy.