Join to apply for the Site Reliability Engineer role at XM
Continue with Google Continue with Google
1 week ago Be among the first 25 applicants
Join to apply for the Site Reliability Engineer role at XM
Get AI-powered advice on this job and more exclusive features.
Sign in to access AI-powered advices
Continue with Google Continue with Google
Continue with Google Continue with Google
Continue with Google Continue with Google
Continue with Google Continue with Google
Continue with Google Continue with Google
Continue with Google Continue with Google
Site Reliability Engineers (SRE) - Multiple Openings
The Role :
You will join a team working with Observability, Escalations, Post-mortems, Correction of Errors, and other practices that will contribute to the company's goal of cloud resiliency. You will be responsible for driving processes around reliability, best practices, cultural change, and enforcement of these practices.
The main responsibilities of the position include :
- Honor and practice the Resiliency pillar of the Well Architected Framework in all tasks and responsibilities
- Conduct Chaos Engineering experiments and relevant exercises to improve resiliency and fault-tolerance
- Research workloads for migrating to the cloud with minimal disruption and impact
- Monitor cloud migration projects to ensure seamless transitions
- Design, consult, re-platform, and re-factor the observability of current cloud infrastructure
- Coordinate with other IT departments and teams regarding observability for both individual and organizational needs
- Regularly assess cloud deployments for compliance with the company’s standards and best practices
- Investigate and correct areas where observability is lagging
- Stay up to date and provide training on new and current technologies, services, tools, methodologies, and practices
- Occasionally participate in service capacity planning, software performance analysis, and system tuning
- Mentor colleagues in technical skills and knowledge
- Analyze, oversee, and remediate the company’s resiliency
- Participate in on-call support 24 / 7 based on a rotation schedule
Main requirements :
BSc / MSc degree in Computer Science or related field5+ years of cloud services experience, with at least 3 years on AWS cloud3+ years of experience in SRE or a similar roleExperience with monitoring, APM, logging, and notification toolsFamiliarity with incident, problem and change management procedures and practicesAdvanced knowledge of SRE practices and methodsUnderstanding and practice of Service LevelsStrong troubleshooting skills and the ability to mentor othersExtensive experience with Kubernetes and related technologies, services, and ecosystemAdvanced knowledge of CI / CD, Infrastructure as Code (IaC) concepts and tools, especially HCL Terraform and AWS CloudFormationExperience with versioning tools like GitStrong organizational and documentation skillsExceptional time management and research abilitiesAdvanced Linux, networking, and scripting skillsThe following will be considered an advantage :
Experience with platforms like Kafka (MSK)Experience with RDBMSs, particularly Postgres and MySQLKnowledge of scripting languages such as Python or GoBenefit from :
Attractive remuneration package and perksIntellectually stimulating work environmentContinuous personal development and international training opportunitiesThe Hiring Experience : What Awaits You
Show Your Skills – Online Technical ChallengeLet’s Connect – Intro Chat with Talent AcquisitionDeep Dive – First Interview with Your Future TeamFinal Connection – Final InterviewAll applications will be treated with strict confidentiality!
Seniority level
Seniority level Mid-Senior levelEmployment type
Employment type Full-timeJob function
Job function Information TechnologyReferrals increase your chances of interviewing at XM by 2x
Sign in to set job alerts for “Site Reliability Engineer” roles.
Continue with Google Continue with Google
Continue with Google Continue with Google
Senior Site Reliability Engineer (100% remote-friendly within Spain) DevOps Engineer (Github Actions) - 100% Remote from Spain Site Reliability Engineer (SRE) - 1014899964 Site Reliability Engineer, Technical Referent
Greater Madrid Metropolitan Area 3 months ago
Madrid, Community of Madrid, Spain 7 months ago
Site Reliability Engineer (100% remote, permanent, global role)
Madrid, Community of Madrid, Spain 4 weeks ago
Python Backend Junior Software Engineer - Remote 4 days a week (Europe)
Madrid, Community of Madrid, Spain 4 months ago
Las Palmas, Canary Islands, Spain 1 year ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
J-18808-Ljbffr