Or your alerts
G

Senior Director, Reliability Tools and Practices

GEICO

Admin & Office

IT & Telecoms Confidential
1 month ago

Job Summary

GEICO is seeking an experienced and visionary technical Director / Senior Director of Reliability Tools and Practices Engineering within the Site Reliability Engineering (SRE) organization. You will play a critical role in ensuring the reliability, availability, and performance of the company’s systems and services by defining, developing, and delivering Reliability Tooling and Platforms to Geico engineering organization. You will lead a team responsible for developing, implementing, and maintaining tools, practices, and processes that enable their organization to achieve world-class reliability and operational excellence. This role combines technical expertise, leadership, and strategic thinking to drive continuous improvement in reliability and scalability.

  • Minimum Qualification: Degree
  • Experience Level: Senior level
  • Experience Length: 5 years

Job Description/Requirements

Responsibilities:

Team Leadership:

  • Build and lead a high-performing team of reliability engineers and tooling specialists.
  • Provide mentorship, guidance, and professional development opportunities.


Vendor Relations:

  • Manage relationships with third-party tooling vendors, negotiate contracts, and stay informed about emerging trends and innovations in the field.


Compliance and Security:

  • Ensure that all reliability tools and practices adhere to security and compliance standards, and drive efforts to continuously enhance security.


Qualifications:

  • Bachelor's or Master's degree in Computer Science, Information Technology, or related field, or equivalent practical experience. Advanced degree preferred. 
  • Proven experience leading full-stack SW development teams preferably within a Site Reliability Engineering, Engineering Productivity, Observability or Workflow Automation domains using agile SW development methodologies and DevOps practices.
  • Strong expertise in defining, developing, and managing reliability tools and automation solutions as product owner.
  • Knowledge of chaos engineering principles and tools.
  • Experience with continuous integration and continuous delivery (CI/CD) pipelines.
  • Proficiency in monitoring, alerting, and incident response tools such as Prometheus, Grafana, ELK, PagerDuty, etc.
  • Experience with cloud platforms (e.g., AWS, Azure, GCP) and container orchestration (e.g., Kubernetes).
  • Familiarity with industry best practices in reliability engineering, including SLOs, error budgets, and incident management.
  • Expertise in incident management processes, including creating incident response playbooks, incident triaging strategies, and post-incident analysis to drive continuous improvement in system reliability and availability.


Important Safety Tips

  • Do not make any payment without confirming with the Jobberman Customer Support Team.
  • If you think this advert is not genuine, please report it via the Report Job link below.
Report Job

Share Job Post

Lorem ipsum dolor (Location) Lorem ipsum ₵ Confidential

Job Function : Lorem ipsum

1 year ago

Lorem ipsum dolor (Location) Lorem ipsum ₵ Confidential

Job Function : Lorem ipsum

1 year ago

Lorem ipsum dolor (Location) Lorem ipsum ₵ Confidential

Job Function : Lorem ipsum

1 year ago

Stay Updated

Join our newsletter and get the latest job listings and career insights delivered straight to your inbox.

We care about the protection of your data. Read our privacy policy.

This action will pause all job alerts. Are you sure?

Cancel Proceed
Report Job
Please fill out the form below and let us know more.
Share Job Via Sms

Preview CV