Job Summary
As a SRE, you will work with our Application devops engineers to maintain and scale our ever-growing number of services hosted in the cloud. You will serve as front-line support, triaging issues to the platform, the applications, or the underlying infrastructure. In this role, you will partner with multiple teams within and outside the application Infrastructure team, including a second SRE team who supervises the GPU cloud infrastructure, while this role will focus on monitoring the application stack. You will be involved in on-boarding customers to our services and managing the customer lifecycle.
- Minimum Qualification: Degree
- Experience Level: Mid level
- Experience Length: 5 years
Job Description/Requirements
What You Will Be Doing:
- build/integrate new software, tools and analytics that drive improvements to the availability, scalability, latency, and efficiency of our cloud services products and services
- Handle upgrades, and automated rollbacks across all clusters
- Maintain Service Level Agreement (SLAs) of measurable benchmarks, working hand in hand with developers of new services on how to define SLIs, and design a stable, secure service
- Help guide the Change Advisory Board, and RCCA processes
- Work with engineering, devops and product area leads from technologies across the GPU cloud services stack to guide product engineering to build fast, reliable, and durable production systems
- Drive process changes to improve reliability and performance of our cloud services
- Debug production issues across services and levels of the stack
- Improve operational processes
What we need to see:
- Bachelor's degree in Computer Science or a related field, or equivalent experience
- 5+ years of experience in system design, complexity analysis, software design in Unix/Linux systems, performance, and application issues
- 5+ years of experience authoring, and debugging software written in C++ and python hands-on experience with Kubernetes based cloud environments
- Multi-cloud experience
- Experience working with partners across multiple teams
- Experience operating production systems
Important Safety Tips
- Do not make any payment without confirming with the Jobberman Customer Support Team.
- If you think this advert is not genuine, please report it via the Report Job link below.