Search by job, company or skills
At ViewSonic Technologies, we are passionate about building software that solves real-world problems. Our Site Reliability Engineers (SREs) play a critical role in delivering robust systems with high availability, scalability, and performance, enabling customers to achieve their missions effectively. As we expand customer deployments and scale our platforms, we are seeking a Senior SRE Engineer to
partner with product and DevOps teams. This role will focus on designing resilient cloud infrastructure, implementing efficient monitoring, and driving high availability across our software ecosystem.
Key Responsibilities:
-Infrastructure & Systems: Build and manage cloud-based infrastructure and distributed applications, ensuring scalability, resilience, and performance.
-Production Reliability: Monitor availability and system health, taking a holistic approach to maintaining uptime.
-Performance Optimization: Measure, analyze, and improve system performance to anticipate customer needs and enable continuous improvement.
-Reliability Engineering: Enhance quality, reliability, and time-to-market of software solutions through automation,preventive actions, and best practices.
-Operational Excellence: Provide primary operational support and engineering for multiple large-scale distributed systems.
-Incident Management: Lead and participate in incident response, ensuring timely resolution, effective communication, and solid postmortems.
-On-Call Participation: Contribute to a shared on-call rotation, including nights and weekends, while fostering aculture of automation and documentation to reduce disruptions.
-Collaboration & Mentorship: Partner with cross-functional teams (Dev, RD, Product, Security) and mentor junior engineers on reliability, automation, and cloud practices.
Qualifications & Experience
-Bachelor's degree (or equivalent) in Computer Science, Engineering, or related
discipline.
-8+ years of experience in the software industry with a minimum of 4 years in a dedicated SRE role.
-AWS Certified in one or more of the following:SysOps Administrator, DevOps Engineer, Solutions Architect.
Technical Expertise
-Strong ability to design and build cloud infrastructure on AWS ensuring high availability, scalability, resilience, reliability, and performance.
-Proficient in Infrastructure as Code (IaC) using Terraform to automate infrastructure deployment.
-Strong programming skills in one or more languages:Python, Java, Ruby, JavaScript (structured and OOP).
-Hands-on experience with containerization and orchestration technologies: Docker, Kubernetes, Yarn, ECS, EKS.
-Deep knowledge of monitoring and observability tools: Prometheus, Grafana, or equivalent.
-Skilled in troubleshooting performance bottlenecks, conducting root cause analysis (RCA), and driving system improvements.
Operational Responsibilities:
-Participate in and lead on-call rotations to ensure production uptime, resolve incidents quickly, and minimize downtime.
-Drive asupportive on-call culture
focused on automation, runbooks, documentation, and continuous improvement.
-Lead incident response, postmortems, and preventive action plans
to strengthen system reliability.
-Proactively identify risks, design faulttolerant solutions, and implement
self-healing, automated recoverymechanisms.
Leadership & Collaboration
-Partner closely with product engineering teams to embed SRE best practices into the software development lifecycle.
-Provide technical leadershipin design reviews, architecture discussions, and production readiness assessments.
-Mentor and coach engineers to grow their reliability, automation, and cloud skills.
-Excellent communication and collaboration skills to influence cross-functional teams and stakeholders.
Date Posted: 22/09/2025
Job ID: 126870031