Or your alerts

Manager Site Reliability Engineering


Engineering & Technology

IT & Telecoms Confidential
3 weeks ago

Job Summary

As the SRE Manager, you will lead and manage our SRE team, working closely with cross-functional teams to establish and enhance our reliability engineering practices. You will be responsible for driving the continuous improvement of our systems' reliability, scalability, and efficiency, while also ensuring prompt incident response and effective problem resolution. In addition, you will play a key role in setting and achieving service level objectives (SLOs) and driving the adoption of best practices for monitoring, alerting, and automation. The Manager of SRE is a hands-on technical role and requires a thorough understanding of all components of a modern web application stack, including front-end, backend, database, networking, and systems-level knowledge.

  • Minimum Qualification: Degree
  • Experience Level: Mid level
  • Experience Length: 2 years

Job Description/Requirements


  • Collaborate with development, operations, and product teams to optimize the reliability, scalability, and performance of our systems
  • Define and monitor service level objectives (SLOs) to ensure the availability and performance of our services
  • Implement effective incident management and problem resolution processes, ensuring minimal impact to customers
  • Develop and maintain monitoring and alerting systems to proactively identify and mitigate potential issues
  • Drive automation efforts to streamline deployments, infrastructure provisioning, and operational tasks
  • Perform post-incident reviews to identify root causes, implement preventive measures, and share lessons learned
  • Stay up to date with industry trends and emerging technologies, and assess their potential impact on our SRE practices


  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field
  • 2-6 years of experience in Site Reliability Engineering or a related role, with demonstrated experience in leading and managing teams
  • Strong knowledge of SRE and DevOps principles, practices, and methodologies
  • Proficiency in scripting and automation using tools such as Python, NodeJS, or other langugages
  • Experience with cloud platforms (AWS, Azure, GCP) and infrastructure-as-code (IaC) tools like Terraform
  • Expertise in monitoring and observability tools (e.g., Prometheus, Datadog, New Relic, ELK stack)
  • Expertise with containerization technologies (Docker, Kubernetes
  • Familiarity with incident response and post-incident analysis processes
  • Strong analytical and problem-solving skills
  • Excellent communication and leadership ability

Important Safety Tips

  • Do not make any payment without confirming with the Jobberman Customer Support Team.
  • If you think this advert is not genuine, please report it via the Report Job link below.
Report Job

Share Job Post

Lorem ipsum dolor (Location) Lorem ipsum ₵ Confidential

Job Function : Lorem ipsum

1 year ago

Lorem ipsum dolor (Location) Lorem ipsum ₵ Confidential

Job Function : Lorem ipsum

1 year ago

Lorem ipsum dolor (Location) Lorem ipsum ₵ Confidential

Job Function : Lorem ipsum

1 year ago

Stay Updated

Join our newsletter and get the latest job listings and career insights delivered straight to your inbox.

We care about the protection of your data. Read our privacy policy.

This action will pause all job alerts. Are you sure?

Cancel Proceed
Report Job
Please fill out the form below and let us know more.
Share Job Via Sms

Preview CV