Search by job, company or skills

Gamania Digital Entertainment

Site Reliability Engineer

Save
new job description bg glownew job description bg glow
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

<Responsibilities>

  1. Maintain and optimize infrastructure.
  2. Monitor the service and fix any problem that occurs in the shortest possible time.
  3. Monitor the usage of various resource indicators and the overall status of the system and perform optimization.
  4. Maintain system stability and deal with emergencies.
  5. Avoid system failures and service interruptions.
  6. Work with other teams to continuously improve system architecture and service quality.
  7. Design and build a complete process for system maintenance, deployment, and system upgrade.

<Required Skills>

  • Experience implementing monitoring service and log collecting system
  • Familiar with Linux
  • Understands containers and Kubernetes management and scheduling
  • Experience in RDBMS and NoSQL cluster service implementation
  • Experience in access control management of infrastructure and information security management
  • Familiar with IaC tool, such as terraform
  • Familiar with container technology, such as docker, containerd, podman
  • Familiar with common monitoring system in Kubernetes, such as Prometheus

<Preferred Skills>

  • Experience deploying at least one of the following cloud services: Azure, GCP, AWS
  • Understands DevOps and its concept
  • Familiar with basic network architecture, such as HTTP/HTTPS, TCP/IP, DNS, CDN
  • Familiar with setting up and adjusting configuration of web server, such as NGINX, Apache
  • Experience in CI tools (e.g., Gitlab CI, Jenkins, Github Actions) for deploying, setting up, and maintaining the service
  • Familiar with integrations and operations between different monitoring system (Zabbix, Cacti, Nagios, Smokeping, etc.), as well as all types of logs and system generated data (Logstash + Elasticsearch + Kibana ELK or Splunk, New Relic, Prometheus)
  • Familiar with distributed tracing platform and network architecture design, e.g., Opentelemetry, Jaeger, Tempo

More Info

Job Type:
Industry:
Employment Type:

Job ID: 147785563