2 days ago Be among the first 25 applicants
The purpose of the Site Reliability Engineer (SRE) role is to enhance and maintain the high availability and reliability of systems and applications, ensuring they effectively support business operations and contribute to a positive user experience. This role sits at the crossroads of software engineering and operations, adopting practices from both disciplines to create robust and efficient systems. Their responsibilities include :
- Enhancing and maintaining the availability and reliability of systems and applications.
- Proactively managing incidents to minimize downtime.
- Optimizing system performance and ensuring scalability.
- Implementing automation to increase operational efficiency.
- Collaborating with security teams to strengthen system protection.
- Maintaining detailed documentation to facilitate knowledge sharing.
- Working with development teams to integrate reliability from the design phase.
- Continuously evaluating and optimizing system performance and operational processes.
- Ensuring the technological infrastructure supports business growth and objectives.
What does he / she do? (tasks) :
Involve in architecture decisions to ensure systems resiliency at the outset of software developmentAutomation and Orchestration :
Develop scripts and use tools to automate deployment, infrastructure provisioning, configuration management, and scaling, using the CI / CD development method.Orchestrate complex workflows across various environments to ensure consistency and reliability.Continuous Integration and Continuous Deployment (CI / CD) :
Design, implement, and manage CI / CD pipelines to facilitate rapid and reliable code deployments with minimal manual intervention. This may include integrating automated testing to ensure code qualityInfrastructure as Code (IaC) :
Foster use of IaC tools and practices to manage infrastructure provisioning and configuration, ensuring environments are reproducible, scalable, and maintainable.Monitoring, Logging, and Alerting :
Implement comprehensive monitoring and logging solutions to collect, analyze, and act on performance data and alerts.Use observability data to proactively identify and address issues, ensuring high availability and performance.Performance Optimization :
Regularly assess system performance to identify bottlenecks and inefficiencies.Implement optimizations to improve system response times, resource utilization, and users satisfactionIncident Management and Reliability Engineering :
Participate in on-call rotations, swiftly address and resolve incidents, and lead post-mortem analyses to identify root causes and prevent recurrence.Develop resilience and recovery strategies to meet defined Service Level Objectives (SLOs).Security and Compliance :
Ensure that all aspects of software development, deployment, and operations adhere to security best practices and compliance requirements.Implement security controls, conduct regular audits, and address vulnerabilities promptlyQuality Assurance (QA) :
Facilitate QA Teams : Provide support to QA teams by setting up environments and deploying necessary tools for quality-related activities.Automation Support : Collaborate with QA to automate testing processes and manage risks effectively.Non-Functional Testing : Work closely with QA to develop, execute and evaluate outcomes from non-functional testingResponsibilities
Develop, Scale, and Automate : Design, build, and scale systems using advanced automation techniques. Develop and maintain automation scripts for system deployment and management.Incident Management : Lead on-call rotations for specific systems. Conduct detailed post-mortem analyses and develop preventative strategies.Performance Metrics : Define and monitor critical reliability metrics independently. Analyze performance data to identify trends and areas for improvement.Cross-functional Collaboration : Work closely with development teams to ensure system reliability and performance from the design phase. Advocate for SRE principles across teams.Capacity Planning and Management : Lead capacity planning and management efforts, aligning with business needs and objectives. Develop strategies for scalability and performance under varying loads.Continuous Improvement : Identify and address inefficiencies in current systems and processes. Champion new technologies for operational excellence.Security : Lead initiatives to strengthen system security postures. Conduct vulnerability assessments and remediation efforts.Mandatory Skills :
Monitoring, Logging, and Observability : Desired advanced in comprehensive monitoring, logging, and observability strategiesAutomation : Recommended advanced knowledge in Python and Bash for complex automation.Configuration as Code : Recommended Advanced skills in Ansible for sophisticated configuration management.Containerization and Orchestration : Intermediate knowledge of Docker and basic Kubernetes.Databases : Recommended advanced knowledge in managing databases, with a focus on relational / no relational databases.Version Control Systems : Desired advanced knowledge in proficiency with Git,Recommended Skills :
Infrastructure as Code : Recommended Advanced skills in Terraform for sophisticated infrastructure provisioning and managementProgramming : Recommended proficient in Java, with practical experience in Spring Boot.Cloud Platform : Recommended Advanced knowledge of Cloud Platforms. Job DescriptionNetworking and Security : Advanced knowledge in understanding of advanced networking and security concepts and practices.Databases : Recommended advanced knowledge in managing databases, with a focus on relational / no relational databases.CI / CD : Understanding and experience on continuous integration / deployment concepts.Soft Skills
Communication : Effective verbal and written communication, focusing on clarity and understanding.Collaboration : Teamwork, learning from others, and supporting team members.Problem-solving : Ability to address problems with supervision and thorough investigation.Emotional Intelligence : Self-awareness, regulation, and constructive handling of feedback.Adaptability : Willingness to learn new technologies and methodologies.Resilience : Learning from mistakes and not being discouraged by challenges.Customer-focused Mindset : Basic understanding of user experience.Leadership and Time Management : Self-leadership, task management, and productivitySeniority level
Seniority level
Not Applicable
Employment type
Employment type
Full-time
Job function
Job function
Information Technology
Manufacturing
Referrals increase your chances of interviewing at Verisure by 2x
Madrid, Community of Madrid, Spain 2 weeks ago
Site Reliability Engineer | North America | Canada | Europe | Fully Remote
Madrid, Community of Madrid, Spain 8 months ago
Madrid, Community of Madrid, Spain 2 weeks ago
Madrid, Community of Madrid, Spain 2 months ago
Madrid, Community of Madrid, Spain 4 weeks ago
Madrid, Community of Madrid, Spain 1 month ago
Madrid, Community of Madrid, Spain 2 weeks ago
Software Engineer (Python) - AI Platform
Getafe, Community of Madrid, Spain 2 months ago
Madrid, Community of Madrid, Spain 1 week ago
Site Reliability Engineer, Technical Referent
Greater Madrid Metropolitan Area 4 months ago
Madrid, Community of Madrid, Spain 1 week ago
Madrid, Community of Madrid, Spain 1 week ago
Madrid, Community of Madrid, Spain 2 weeks ago
Madrid, Community of Madrid, Spain 4 months ago
Site Reliability Engineer (Azure, Terraform, GitHub) (f / m / d)
Madrid, Community of Madrid, Spain 3 days ago
Site Reliability Engineer (100% remote, permanent, global role)
Madrid, Community of Madrid, Spain 4 hours ago
Madrid, Community of Madrid, Spain 2 weeks ago
Madrid, Community of Madrid, Spain 1 month ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
J-18808-Ljbffr
J-18808-Ljbffr