Google welcomes people with disabilities.
Minimum qualifications:
- Bachelor's degree in Electrical Engineering, Mechanical Engineering, Reliability Engineering, Materials Science, or a related technical discipline, or equivalent practical experience.
- 10 years of experience in manufacturing.
- 8 years of experience in people management.
Preferred qualifications:
- Experience with large-scale data center infrastructure, high-density compute/server topologies, or power/cooling sub-systems.
- Demonstrated experience in performing risk mitigation during early design phases using predictive modeling or reliability simulations before design lockdown.
- Experience designing and executing accelerated life testing (ALT, HALT) and manufacturing detection profiles tailored to data center environmental profiles.
- Deep expertise in structured problem-solving methodologies (e.g., 8D, FMEA, FTA) and physical failure analysis for complex electronic assemblies or server-grade hardware.
- Strong background in data analysis tools (e.g., JMP, SQL, Python/R) for life-data analysis, Weibull modeling, and predicting fleet-wide failure rates.
About The Job
Be part of a team that pushes boundaries, developing custom silicon solutions that power the future of Google's direct-to-consumer products. You'll contribute to the innovation behind products loved by millions worldwide. Your expertise will shape the next generation of hardware experiences, delivering unparalleled performance, efficiency, and integration.
In this role, you will lead the team responsible for building reliability into our products from early architecture through global deployment. You will shift our focus from reactive troubleshooting to scalable strategy, partnering with Design teams and APAC manufacturers to define specifications and mitigate hardware risks before they hit production. Ultimately, you will own the technical strategy for NPI reliability frameworks, drive systemic root-cause failure analysis, and oversee the health of our active global fleet to ensure our infrastructure remains highly resilient.
The AI and Infrastructure team is redefining what's possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
We're the driving team behind Google's groundbreaking innovations, empowering the development of our AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.
Responsibilities
- Coach, mentor, and scale a Reliability Engineering team across planning, validation, and fleet failure analysis, optimizing resource allocation to navigate evolving data center complexities at a fast-moving pace.
- Oversee manufacturing stability to ensure intrinsic product reliability across all verticals at APAC contract manufacturer locations, proactively identifying workflow opportunities to better support dynamic business needs.
- Drive Design for Reliability (DfR) methodologies and DFMEAs from the initial concept phase, formalizing a lessons learned pipeline to directly shape design rules for next-generation ML hardware.
- Lead high-priority investigations for complex, intermittent field reliability failures, guiding internal teams, OEMs, and external laboratories through advanced failure analysis techniques to validate conclusions and enforce strict remediation standards.
- Utilize statistical tools, physics-of-failure models, and internal reliability data to predict product life performance, feedback application stress, enable early detection, and define comprehensive end-of-life strategies.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form .