Esta oferta de trabajo no está disponible en tu país.

Site Reliability Engineer

Verisurealmería, España

Hace más de 30 días

Descripción del trabajo

2 days ago Be among the first 25 applicants

The purpose of the Site Reliability Engineer (SRE) role is to enhance and maintain the high availability and reliability of systems and applications, ensuring they effectively support business operations and contribute to a positive user experience. This role sits at the crossroads of software engineering and operations, adopting practices from both disciplines to create robust and efficient systems. Their responsibilities include :

Enhancing and maintaining the availability and reliability of systems and applications.
Proactively managing incidents to minimize downtime.
Optimizing system performance and ensuring scalability.
Implementing automation to increase operational efficiency.
Collaborating with security teams to strengthen system protection.
Maintaining detailed documentation to facilitate knowledge sharing.
Working with development teams to integrate reliability from the design phase.
Continuously evaluating and optimizing system performance and operational processes.
Ensuring the technological infrastructure supports business growth and objectives.

What does he / she do? (tasks) :

Involve in architecture decisions to ensure systems resiliency at the outset of software development

Automation and Orchestration :

Develop scripts and use tools to automate deployment, infrastructure provisioning, configuration management, and scaling, using the CI / CD development method.

Orchestrate complex workflows across various environments to ensure consistency and reliability.

Continuous Integration and Continuous Deployment (CI / CD) :

Design, implement, and manage CI / CD pipelines to facilitate rapid and reliable code deployments with minimal manual intervention. This may include integrating automated testing to ensure code quality

Infrastructure as Code (IaC) :

Foster use of IaC tools and practices to manage infrastructure provisioning and configuration, ensuring environments are reproducible, scalable, and maintainable.

Monitoring, Logging, and Alerting :

Implement comprehensive monitoring and logging solutions to collect, analyze, and act on performance data and alerts.

Use observability data to proactively identify and address issues, ensuring high availability and performance.

Performance Optimization :

Regularly assess system performance to identify bottlenecks and inefficiencies.

Implement optimizations to improve system response times, resource utilization, and users satisfaction

Incident Management and Reliability Engineering :

Participate in on-call rotations, swiftly address and resolve incidents, and lead post-mortem analyses to identify root causes and prevent recurrence.

Develop resilience and recovery strategies to meet defined Service Level Objectives (SLOs).

Security and Compliance :

Ensure that all aspects of software development, deployment, and operations adhere to security best practices and compliance requirements.

Implement security controls, conduct regular audits, and address vulnerabilities promptly

Quality Assurance (QA) :

Facilitate QA Teams : Provide support to QA teams by setting up environments and deploying necessary tools for quality-related activities.

Automation Support : Collaborate with QA to automate testing processes and manage risks effectively.

Non-Functional Testing : Work closely with QA to develop, execute and evaluate outcomes from non-functional testing

Responsibilities

Develop, Scale, and Automate : Design, build, and scale systems using advanced automation techniques. Develop and maintain automation scripts for system deployment and management.

Incident Management : Lead on-call rotations for specific systems. Conduct detailed post-mortem analyses and develop preventative strategies.

Performance Metrics : Define and monitor critical reliability metrics independently. Analyze performance data to identify trends and areas for improvement.

Cross-functional Collaboration : Work closely with development teams to ensure system reliability and performance from the design phase. Advocate for SRE principles across teams.

Capacity Planning and Management : Lead capacity planning and management efforts, aligning with business needs and objectives. Develop strategies for scalability and performance under varying loads.

Continuous Improvement : Identify and address inefficiencies in current systems and processes. Champion new technologies for operational excellence.

Security : Lead initiatives to strengthen system security postures. Conduct vulnerability assessments and remediation efforts.

Mandatory Skills :

Monitoring, Logging, and Observability : Desired advanced in comprehensive monitoring, logging, and observability strategies

Automation : Recommended advanced knowledge in Python and Bash for complex automation.

Configuration as Code : Recommended Advanced skills in Ansible for sophisticated configuration management.

Containerization and Orchestration : Intermediate knowledge of Docker and basic Kubernetes.

Databases : Recommended advanced knowledge in managing databases, with a focus on relational / no relational databases.

Version Control Systems : Desired advanced knowledge in proficiency with Git,

Recommended Skills :

Infrastructure as Code : Recommended Advanced skills in Terraform for sophisticated infrastructure provisioning and management

Programming : Recommended proficient in Java, with practical experience in Spring Boot.

Cloud Platform : Recommended Advanced knowledge of Cloud Platforms. Job Description

Networking and Security : Advanced knowledge in understanding of advanced networking and security concepts and practices.

Databases : Recommended advanced knowledge in managing databases, with a focus on relational / no relational databases.

CI / CD : Understanding and experience on continuous integration / deployment concepts.

Soft Skills

Communication : Effective verbal and written communication, focusing on clarity and understanding.

Collaboration : Teamwork, learning from others, and supporting team members.

Problem-solving : Ability to address problems with supervision and thorough investigation.

Emotional Intelligence : Self-awareness, regulation, and constructive handling of feedback.

Adaptability : Willingness to learn new technologies and methodologies.

Resilience : Learning from mistakes and not being discouraged by challenges.

Customer-focused Mindset : Basic understanding of user experience.

Leadership and Time Management : Self-leadership, task management, and productivity

Seniority level

Not Applicable

Employment type

Full-time

Job function

Information Technology

Manufacturing

Referrals increase your chances of interviewing at Verisure by 2x

Madrid, Community of Madrid, Spain 2 weeks ago

Site Reliability Engineer | North America | Canada | Europe | Fully Remote

Madrid, Community of Madrid, Spain 8 months ago

Madrid, Community of Madrid, Spain 2 weeks ago

Madrid, Community of Madrid, Spain 2 months ago

Madrid, Community of Madrid, Spain 4 weeks ago

Madrid, Community of Madrid, Spain 1 month ago

Madrid, Community of Madrid, Spain 2 weeks ago

Software Engineer (Python) - AI Platform

Getafe, Community of Madrid, Spain 2 months ago

Madrid, Community of Madrid, Spain 1 week ago

Site Reliability Engineer, Technical Referent

Greater Madrid Metropolitan Area 4 months ago

Madrid, Community of Madrid, Spain 1 week ago

Madrid, Community of Madrid, Spain 2 weeks ago

Madrid, Community of Madrid, Spain 4 months ago

Site Reliability Engineer (Azure, Terraform, GitHub) (f / m / d)

Madrid, Community of Madrid, Spain 3 days ago

Site Reliability Engineer (100% remote, permanent, global role)

Madrid, Community of Madrid, Spain 4 hours ago

Madrid, Community of Madrid, Spain 2 weeks ago

Madrid, Community of Madrid, Spain 1 month ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

J-18808-Ljbffr

Crear una alerta de empleo para esta búsqueda

Site Reliability Engineer • almería, España