Talent.com
Site Reliability Engineer (SRE) – Kubernetes & Cloud Infrastructure (f / m / d)

Site Reliability Engineer (SRE) – Kubernetes & Cloud Infrastructure (f / m / d)

IONOSBarcelona, Cataluña, España
Hace 20 días
Descripción del trabajo

Join to apply for the Site Reliability Engineer (SRE) – Kubernetes & Cloud Infrastructure (f / m / d) role at IONOS

IONOS is the largest European provider of cloud infrastructure, cloud services, and hosting solutions . We offer you a long-term perspective in one of the most future-proof industries.

Our culture is defined by open structures, flat hierarchies, first-name terms, and a strong team spirit . We firmly believe that work and fun are compatible and provide the right environment for it.

Thanks to our continuous growth , we are looking for new colleagues to join us. Become part of IONOS and let’s grow together!

Your Role as a Site Reliability Engineer (SRE ) in the IONOS Applications team, you will be part of the technical backbone of critical platforms such as IONOS and STRATO Webmail , as well as other web services operated on our Kubernetes platform

You will work alongside experienced colleagues on the design of new resilient and high-performance services and products , even under extreme loads

Main Responsibilities

  • Contribute to the evolution of product infrastructure, integrating new services and applications into our cloud and Kubernetes environment
  • Ensure the stable and secure operation of our platform
  • Perform in-depth analysis and optimization of distributed and highly scalable environments
  • Drive automation using tools such as Terraform, GitLab CI / CD, and Argo CD, managing infrastructure declaratively and reproducibly
  • Analyze and resolve complex issues in distributed systems, contributing to the continuous improvement of the platform
  • Develop and maintain monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, ELK Stack) to proactively detect bottlenecks and sources of error
  • Participate in on-call rotations, one week every 4 to 5 weeks
  • Collaborate with product development teams to organize joint projects
  • Manage incidents end-to-end : initial analysis, ticket creation, resolution, and follow-up through Problem Management
  • Have access to up to one day per week for learning and continuous training

Your Profile

  • Several years of experience as an SRE or in similar roles (Linux System Administrator, DevOps Engineer, Platform Engineer, Full Stack Developer)
  • Advanced expertise in Linux, container technologies, and especially Kubernetes
  • Experience with Infrastructure as Code (preferably Terraform), CI / CD pipelines (GitLab CI / CD, GitHub Actions), and Helm charts
  • Proficiency in at least one programming or scripting language (Go, Python, Bash) for automation and monitoring tasks
  • Experience in operating and troubleshooting high-availability production environments
  • Knowledge of monitoring, alerting, and log analysis for distributed applications (Prometheus, Grafana, FluentD, ELK, VictoriaMetrics, Icinga)
  • A proactive, solution-oriented, and independent working style, with the ability to systematically analyze and sustainably resolve technical problems
  • Good command of English (spoken and written)
  • #J-18808-Ljbffr

    Crear una alerta de empleo para esta búsqueda

    Site Reliability Engineer • Barcelona, Cataluña, España