New
3 weeks ago

Job Summary

In addition to our DevOps Team we are building a Site Reliability Team whose purpose is to focus on site reliability and security. It will also involved deployment, configuration, and monitoring, as well as the availability, latency, change management, emergency response, and capacity management of services in production.

  • Minimum Qualification:Degree
  • Experience Level:Entry level
  • Experience Length:3 years

Job Description/Requirements

Responsibilities:

  • Work with a team of DevOps/SRE and DBA professionals
  • Improve existing infrastructure and processes currently deployed in as well as streamlining processes deploy to new countries in the future
  • Holistically improve all aspects of our current infrastructure including: reducing costs; streamlining environment provisioning; lowering response times and incorporating the latest techniques and technologies
  • Monitor and maintain the existing cloud infrastructure via autoscaling, automated alerts, andOpsWork and Grafana dashboards
  • Take ownership and responsibility for our cloud operation activities
  • Liaise with external security agencies for annual audits as well as perform our own internal security sweeps
  • Aid in reconfiguring existing architecture to allow for rapid deployments to new countries
  • Mentoring less experienced team members


Requirements:

  • 3+ years SRE experience
  • Experience independently leading the planning and deployment of a project
  • Experienced with cloud platforms, especially AWS, including solid knowledge of how to utilize cloud resources to fulfill the demand from other teams and production
  • Familiar with one program language or script language (Python, Java....)
  • Experience managing multiple kubernetes clusters in production (virtualization, orchestration, scalability, security, and high availability), skillset such as Helm, Rancher, ArgoCD.
  • Solid networking protocol and cyber security knowledge, especially the TCP / IP stack and HTTP protocol 
  • A strong understanding of cache, including CDN, HTTP cache (CloudFlare, AWS CloudFront)
  • Experienced with CloudNative Monitoring solution in Large distributed system using observation model(Trace, Metric, Logging), skillset such as Prometheus, Jaeger, Loki, ELK, Grafana.
  • Excellent troubleshooting skills, including Linux OS issue diagnosis and OS parameter optimization

Important Safety Tips

  • Do not make any payment without confirming with the Jobberman Customer Support Team.
  • If you think this advert is not genuine, please report it via the Report Job link below.
Report Job

Share Job Post

Lorem ipsum dolor (Location) Lorem ipsum ₵ Confidential

Job Function : Lorem ipsum

6 months ago

Lorem ipsum dolor (Location) Lorem ipsum ₵ Confidential

Job Function : Lorem ipsum

6 months ago

Lorem ipsum dolor (Location) Lorem ipsum ₵ Confidential

Job Function : Lorem ipsum

6 months ago

Stay Updated

Join our newsletter and get the latest job listings and career insights delivered straight to your inbox.

We care about the protection of your data. Read our privacy policy.

This action will pause all job alerts. Are you sure?

Cancel Proceed
Follow us On:
Follow us on FacebookFollow us on InstagramFollow us on LinkedInFollow us on TwitterFollow us on YouTube
Get it on Google Play
2023 Jobberman