Search by job, company or skills

AMD Maquinaria

FAE/ Field Support Manager

new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

Key Responsibilities

Customer Engagement & Technical Debug Support

  • Serve as the primary technical interface for customers on GPU server bring‑up, stability, and debug issues.
  • Support customers during system integration, validation, and production ramp, acting as the first line of escalation.

Pre‑Sales & Post‑Sales Support

  • Support POC, EVT/DVT/PVT, and early customer deployments from a system debug perspective.
  • Review customer system architecture and provide debug readiness, risk assessment, and best‑practice guidance.

Server Bring‑Up & Issue Debugging - EPYC

  • Diagnose and resolve server‑level issues including boot failures, OS bring‑up, GPU/NIC detection, PCIe issues, and system hangs.
  • Perform HW/SW co‑debug across BIOS/UEFI, BMC, firmware, drivers, OS, and GPU stacks.
  • Analyze logs, dumps, and traces (BIOS, BMC, OS, GPU, NIC) to isolate root causes.
  • Work closely with ODMs, component vendors, and internal engineering teams to drive issue closure.

GPU & Platform Debug - Instinct/Pensando

  • Debug GPU server issues related to power, thermals, PCIe, interconnects, and multi‑GPU configurations.
  • Validate GPU functionality under stress, burn‑in, and long‑run stability conditions.
  • Support RMA analysis and failure reproduction when required.

Performance Validation & Stability

  • Assist with system‑level performance validation and identify platform bottlenecks.
  • Support customer concerns related to system stability, reliability, and scalability in multi‑GPU servers.

Documentation & Knowledge Sharing

  • Create debug guides, checklists, and best‑practice documents for server bring‑up and issue triage.
  • Provide technical training to customers and internal teams on server debug methodology and tools

Qualifications

  • Bachelor's or Master's degree in related field.
  • 5+ years of experience in server platform debug, GPU systems, or data center hardware support.
  • Strong understanding of x86 server architecture, GPU platforms, PCIe, memory, power, and thermals.
  • Hands‑on experience with Linux OS, system logs, firmware, and driver‑level debugging.
  • Experience working with ODMs/OEMs and cross‑functional engineering teams.
  • Strong communication skills for customer‑facing debug and escalation management.

Preferred Skills

  • Experience debugging GPU servers or AI/HPC platforms in customer environments.
  • Familiarity with BIOS/UEFI, BMC (OpenBMC), firmware update flows, and server validation stages.
  • Understanding of networking (NICs, RDMA, Ethernet/InfiniBand) in GPU servers.
  • Ability to work independently, manage multiple customer issues, and drive problems to closure.

More Info

Job Type:
Function:
Employment Type:

About Company

Job ID: 145496961

Similar Jobs