- Buscador de trabajo
- Barcelona
- f743 service manager sre engineer
F743 service manager sre engineer Empleos en Barcelona
Crear una alerta de empleo para esta búsqueda
F743 service manager sre engineer • Barcelona
- Oferta promocionada
Senior Site Reliability Engineer (SRE)
F. Hoffmann-La Roche Gruppebarcelona, EspañaSite Reliability Engineer (Barcelona, Spain)
Antal InternationalBarcelona, Spain- Oferta promocionada
DevOps Engineer
CapitoleBarcelona, Província de Barcelona, EspañaDevOps SRE
Allianz Technology SE Spain BranchBARCELONA, ES- Oferta promocionada
Engineering Manager DevEx - Core Tech (f / m / d)
Tbwa Chiat / Day Incbarcelona, España- Oferta promocionada
Principal DevOps Engineer
Vitio - Remote Patient Monitoringcatalunya, España- Oferta promocionada
Senior Devops Engineer, hibrido
Jordan martorell s.l.Barcelona, Cataluña, España- Oferta promocionada
Platform Engineer / SRE
Codurance LtdBarcelona, Cataluña, España- Oferta promocionada
System Administrator (SRE)
Multiplica Talentbarcelona, España- Oferta promocionada
SRE - Freelance Security & Automation Engineer (Pentesting Focus)
MindriftBarcelona, Cataluña, España- Oferta promocionada
INCIDENT MANAGER
IRIUM - Spainbarcelona, EspañaSoftware Engineer - Streaming SRE
New Relic, Inc.Barcelona, Spain- Oferta promocionada
Senior DevOps / SRE Engineer – relocation to Spain or Germany
Diabolocom UK LtdBarcelona, Cataluña, España- Oferta promocionada
Staff Platform Engineer
Aston Robinson Internationalbarcelona, España- Oferta promocionada
SRE
VIEWNEXTCatalonia, España- Oferta promocionada
Technical Support Engineer (SRE)
NutanixBarcelona, Cataluña, España- Oferta promocionada
Personal Limpieza Oficinas (Vilanova del Valles)
AXPE CATALUNYA, Cataluña, Spain, Cataluña, España- Oferta promocionada
Site Reliability Engineer (SRE) / Devops (Hybrid / Remote)
Techsoulogybarcelona, catalunya, EspañaSRE INGENIERO Observabilidad • • • Barcelona
Grupo DigitalBarcelona, España- Oferta promocionada
Engineering Manager DevEx - Core Tech (f / m / d)
Free Nowbarcelona, EspañaSenior Site Reliability Engineer (SRE)
F. Hoffmann-La Roche Gruppebarcelona, EspañaRoche fosters diversity, equity and inclusion, representing the communities we serve. When dealing with healthcare on a global scale, diversity is an essential ingredient to success. We believe that inclusion is key to understanding people’s varied healthcare needs. Together, we embrace individuality and share a passion for exceptional care. Join Roche, where every voice matters.
The Position
The role requires the candidate to be available for on-call duty service, responding promptly to urgent issues and emergencies outside of regular working hours, ensuring that critical situations are addressed in a timely and effective manner.
Your Mission
Design and maintain cutting-edge tools, scripts, and frameworks that automate repetitive tasks, streamline software deployment, and manage expansive systems with unparalleled efficiency.
Partner closely with forward-thinking development teams to architect and implement high-performance solutions that elevate system efficiency, optimize resource utilization, and enhance deployment processes for superior uptime and user satisfaction.
Your Core Responsibilities
Reliability Mastery : Proactively monitor and maintain system reliability using advanced tools like DataDog, VictorOps, ELK, Grafana, and Prometheus. Become a key player in ensuring system stability and performance.
Uptime Guardian : Ensure optimal uptime and performance by swiftly identifying issues and responding to alerts with precision.
Technical Troubleshooter : Basic understanding of Architecture and designs to deep dive into complex technical issues, troubleshoot, investigate, and resolve them. Collaborate seamlessly with engineering teams to enable timely and effective resolutions.
Service Excellence : Maintain and consistently achieve defined SLAs, SLIs, and SLOs, ensuring service levels are consistently met or exceeded.
Automation Innovator : Develop and deploy automation scripts (using Python or other scripting languages) to streamline operations, enhance system efficiencies, and reduce manual tasks.
Cloud Steward : Manage and maintain robust infrastructure across AWS and Azure environments, implementing best practices to ensure peak performance, reliability of cloud-based applications. Drive cost optimization through best practice implementation and continuous vigilance.
Cross-functional Collaborator : Work closely with engineering, DevOps, security and operations teams to drive continuous improvement and foster a culture of reliability and inclusion.
Incident Responder : Handle requests and incidents through JIRA and ServiceNow, documenting troubleshooting procedures, solutions, and lessons learned to fuel ongoing improvements.
Flexible Scheduling : Work on-call outside of normal working hours and weekends as scheduled to ensure continuous support.
Team Builder : Actively contribute to the growth and development of the SRE team's capabilities, nurturing a stronger, more inclusive, and resilient team.
Who You Are :
Educational Background : Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent professional experience. An MBA or PhD is a plus, but not required.
Certifications : Relevant industry certifications (AWS / Azure) to showcase your expertise.
Experience : Approximately 5 years of experience in site reliability engineering, IT operations, DevOps, or related fields, or equivalent skills and experience.
Cloud Expertise : Solid experience with AWS and / or Azure, including setting up, monitoring, and maintaining cloud resources (incl. Kubernetes, EKS, AKS, GKE, etc knowledge). Also experience on basis understanding of tools related to Infrastructure as a code, such as Terraform.
Tool Proficiency : Proficiency with monitoring and logging tools such as DataDog, Splunk-Oncall, ELK stack, Grafana, and Prometheus etc. Knowledge of Loki Mimir and Tempo is a plus.
Hands-On Skills : Hands-on experience with JIRA and ServiceNow for tracking incidents, requests, and documentation.
Scripting Knowledge : Proficiency in Python or similar scripting languages for automation purposes.
Incident Response : Understanding of SRE Core principles beside in-depth understanding of incident prioritization, escalation processes, and service level management (SLA / SLO / SLI).
Troubleshooting : Demonstrates proficient troubleshooting capabilities, especially in cloud and distributed system environments.
Communication and Teamwork : Excellent communication, teamwork, and documentation skills, with a proactive and self-motivated approach to improving system reliability and operational efficiencies.
Diversity and Inclusion : We value and encourage candidates from diverse backgrounds and experiences, believing that diverse perspectives drive innovation and success.
Language requirements : Excelling in both spoken and written English communication.
Why Join Us?
By joining our team, you will be part of a dynamic environment where your contributions will directly impact the resilience and reliability of our services. You will have opportunities for professional growth and the ability to collaborate with industry leaders. Let’s drive the future of IT stability together, ensuring an exceptional experience for our customers.
Ready to make a difference? Apply now to be our next SRE Incident Manager and help us build a more reliable future!
J-18808-Ljbffr