
Search by job, company or skills

This job is no longer accepting applications
About the Role
We are seeking a Software Engineer to own the lifecycle of the production-grade software systems that support AI workflows. This role will drive the evolution of our microservices architecture, leveraging containerization and Kubernetes to build stable, scalable distributed systems across our Linux-based environments.
This is a hands-on role for engineers who enjoy working close to production systems and want to grow their expertise in cloud-native and AI-driven platforms.
Key Responsibilities
Design, build, and maintain standalone, production-grade software systems that serve as the foundation for AI workflows.
Lead the deployment and management of services using Kubernetes, ensuring high availability and seamless scaling of containerized workloads.
Own the end-to-end lifecycle of our DevOps infrastructure, automating CI/CD pipelines to ensure repeatable, reliable, and secure system transitions.
Participate in the monitoring, debugging, and iterative improvement of production systems.
Partner closely with AI scientists to translate complex model requirements into robust, scalable engineering solutions.
Author and maintain clear, comprehensive documentation for system architectures, deployment workflows, and operational runbooks to ensure knowledge sharing and system maintainability.
Required Qualifications
Strong software engineering background with a focus on writing clean, maintainable, and well-tested code
Hands-on experience managing and scaling K8s in production (Pods, Networking, Services) and containerizing applications with Docker
Experience working with distributed systems or cloud-native architectures
Experience with cloud platforms (GCP or AWS)
Proficiency with Linux environments and command-line troubleshooting
Proficiency in Python for building backend systems and infrastructure tools
Preferred Qualifications
Experience supporting AI/ML training or inference workflows in production
Proficiency with CI/CD pipelines and Infrastructure as Code
Familiarity with monitoring, logging, and alerting systems
Experience in a startup or fast-growing environment
Job ID: 139502481