Senior Site Reliability Engineer - (Remote - Europe)
Get AI-powered advice on this job and more exclusive features.
Jobgether offers ALL remote jobs globally. We match you to roles where you're most likely to succeed and provide feedback on every application to help you learn. No more guesswork, application black holes, or recruiter ghosting in your job search.
We are looking for a Senior Site Reliability Engineer for one of our clients, remotely from Europe.
As a Senior SRE, you will design, maintain, and optimize reliable and scalable systems. Your responsibilities include tracking performance metrics, automating to improve system reliability, and ensuring best practices for incident management. Your expertise in cloud services, container orchestration, and system performance will drive initiatives to enhance infrastructure efficiency and robustness, collaborating closely with engineering teams to build high-availability systems. This role is ideal for someone passionate about maintaining resilient systems that ensure seamless operations at scale.
Accountabilities :
- Develop and maintain reliable, scalable, and efficient systems
- Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure system performance
- Conduct blameless post-incident reviews, identify root causes, and implement preventive measures
- Automate operational tasks, incident responses, and optimize system performance
- Collaborate with engineering teams to design for reliability, scalability, and maintainability
- Continuously evaluate and improve system performance, capacity, and cost efficiency
- Participate in on-call rotations, troubleshooting, and resolving critical issues
Requirements :
Bachelor's degree in Computer Engineering or a related field5+ years of experience as a Site Reliability Engineer or similar role3+ years of experience with AWS services and container orchestration tools2+ years of Kubernetes experienceStrong knowledge of observability tools (monitoring, logging, tracing)Hands-on experience with Terraform for infrastructure as codeProficiency in at least one programming language (e.g., Python, Go, Java)Experience with incident management, postmortem analysis, and risk mitigationFamiliarity with messaging systems like SNS, SQS, and CI / CD toolsFluent in English with strong communication skillsBenefits :
Fully remote role with flexible working locationsCompetitive salary and performance incentivesHealth insurance coverageAnnual wellness and learning credits for professional growthWork-from-anywhere stipendAnnual company retreat to an exciting destinationInclusive, diverse, and collaborative work environmentAdditional Details :
Seniority level : Mid-Senior levelEmployment type : Full-timeJob function : Information TechnologyIndustries : Non-profit Organizations, EducationJ-18808-Ljbffr