Search by job, company or skills

GMI Cloud

Forward Deployment Engineer (Inference & RL POC)

new job description bg glownew job description bg glownew job description bg svg
  • Posted 16 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Location: Bay area (frequent customer interaction)

Team: Inference & Reinforcement Learning Platform

About the Role

We're looking for a Forward Deployment Engineer (FDE) to work directly with customers and partners to design, deploy, and validate inference and reinforcement learning (RL) proof-of-concepts on GMI's GPU infrastructure.

This is a high-impact, hybrid engineering role that sits at the intersection of platform engineering, applied ML, and customer success. You'll be embedded with customers during early-stage deploymentsturning research ideas, datasets, and business requirements into working, performant systems on real GPU clusters.

If you enjoy being close to users, debugging real systems, and shipping results fast (not just writing docs), this role is for you.

What You'll Do

Own customer POCs end-to-end

  • Deploy and optimize LLM inference, RL training, and post-training workflows on GMI clusters
  • Translate customer requirements into concrete system designs and experiments

Forward-deploy with customers

  • Work hands-on with research teams, startups, and enterprise customers
  • Debug performance, stability, and correctness issues in real environments

Inference deployment

  • Stand up and tune inference stacks (e.g. vLLM / SGLang / Ray Servestyle architectures)
  • Optimize latency, throughput, GPU utilization, and cost efficiency

RL & post-training POCs

  • Support RLHF / RFT / SFT workflows using customer-provided datasets
  • Integrate SDKs, training APIs, and cluster resources to shorten idea experiment cycles

Performance & reliability

  • Diagnose GPU, networking, and distributed system bottlenecks
  • Run benchmarks, profiling, and stress tests on multi-GPU / multi-node setups

Feedback loop to product

  • Feed real-world customer learnings back into GMI's platform, SDKs, and APIs
  • Help shape reference architectures, cookbooks, and best practices

What We're Looking For

Core Requirements

  • Strong software engineering background (Python required; Go / Rust a plus)
  • Hands-on experience with ML inference or training systems
  • Familiarity with distributed systems and GPUs (multi-GPU, multi-node)
  • Comfort working directly with customers and ambiguous requirements
  • Ability to debug end-to-end systems (code, infra, networking, performance)

Nice to Have

  • Experience with:
  • LLM inference frameworks (vLLM, SGLang, Ray Serve, Triton, etc.)
  • RL or post-training workflows (RLHF, RFT, SFT)
  • PyTorch, DeepSpeed, Megatron-LM, or similar
  • Kubernetes-based ML platforms
  • GPU performance profiling and optimization
  • Prior experience as:
  • Forward Deployed Engineer
  • Solutions Engineer
  • ML Platform Engineer
  • Applied Research Engineer

What Makes This Role Special

  • You're close to real users and real GPUsnot abstract roadmaps
  • You'll work on cutting-edge inference and RL workloads, not toy demos
  • You'll influence product direction through direct customer feedback
  • Fast iteration, high ownership, and visible impact

Who Thrives Here

  • Engineers who like shipping over theorizing
  • People who enjoy being the last mile problem solver
  • Builders who want exposure to both deep systems and applied ML
  • Those excited by early-stage POCs that turn into real production systems

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 143890655