Jobfliq.com | Senior Site Reliability Engineer Job in

Senior Site Reliability Engineer
Epam
Canada

/ Month

Experience : 5 Yrs | Full Time

Description :

Implement SRE practices
Identify, craft, and maintain SLIs and SLOs for teams, as well as metrics such as MTTR, Lead time for change, Deployment Frequency and Change Failure Rate
Work with Application teams to set up Observability, Telemetry
Define what it means for a service to be available and develop, monitor, and alert on SLIs/SLOs
Define, track, and enforce error budgets
Review code instrumentation with development teams and ensure necessary dashboards are created to monitor SLI/SLO/SLAs
Establish, test, and tune alerting for varying tiers of applications
Document and maintain runbooks and procedures, automate as much as possible
Plan and execute periodic Disaster Recovery exercises including both tabletop and simulated failures (fault injection)
Perform periodic load and scalability testing to establish baselines, drift, and capacity planning
Design and implement peak readiness reviews for anticipated high-volume times
Participate in quarterly business and operational reviews aligning on roadmaps, development velocity, efficiency, growth trends, etc

Requirements :

5+ years of SRE or Systems Engineering experience
Experience with Any SRE tool, (Grafana, Dynatrace, Splunk are preferable)
Experience with Distributed tracing
Experience with establishing hooks into CI/CD pipeline in lower environments for SRE violations
Soft Skills:
- Ability to work independently and as part of a team
- Strong analytical and problem-solving mindset combined with experience troubleshooting under pressure
- Strategic thinking, complex problem solving and analytical capabilities
- Strong organizational and interpersonal skills, with experience developing and instilling a culture of operational maturity
- Ability to adjust quickly to new technologies

Canada

Last date to apply : 30-11-2023

Get Free Registration