SRE (Site Reliability) Engineer
We are looking for a SRE (Site Reliability) Engineer tojoin our new Enterprise AI platform team.
This is an exciting opportunity to be part of a high-impact, highly technical group focused on solving some of the most challenging machine learning problems in the Life Sciences Healthcare industry. You will bring proven experience in AWS cloud environments and a strong track record of designing and deploying large-scale production infrastructure and platforms.
You will play a critical role in shaping how we use technology, machine learning and data to accelerate innovation. This includes designing, building and deploying next-generation data engines and tools at scale.
This is a hybrid role, with the expectation of occasional office visits in Barcelona.
RESPONSIBILITIES- Develop and maintain the essential infrastructure and platform required to deploy, monitor and manage ML solutions in production, ensuring they are optimized for performance and scalability
- Collaborate closely with data science teams in developing cutting edge data science, AI / ML environments and workflows on AWS
- Liaise with R D data scientists to understand their challenges and work with them to help productionize ML pipelines, models and algorithms for innovative science
- Take responsibility for all aspects of software engineering, from design to implementation, QA and maintenance
- Lead technology processes from concept development to completion of project deliverables
- Liaise with other teams to enhance our technological stack, to enable the adoption of the latest advances in Data Processing and AI
REQUIREMENTS- Significant experience with AWS cloud environments is essential. Knowledge of SageMaker, Athena, S3, EC2, RDS, Glue, Lambda, Step functions, EKS and ECS is also essential
Modern DevOps mindset, using best DevOps tools, such as Docker and GitExperience with infrastructure as code technology such as Ansible, Terraform and Cloud FormationStrong software coding skills, with proficiency in Python, however exceptional ability in any language, will be recognizedExperience managing an enterprise platform and service, handling new client demand and feature requestsExperience with containers and microservice architectures e.g., Kubernetes, Docker and serverless approachesExperience with Continuous Integration and building continuous delivery pipelines, such as CodePipeline, CodeBuild and Code DeployGxP experienceExcellent communication, analytical and problem-solving skillsNICE TO HAVE- Experience building large scale data processing pipelines e.g., Hadoop / Spark and SQL
Use of Data Science modelling tools e.g., R, Python and Data Science notebooks (e.g., Jupyter)Multi cloud experience (AWS / Azure / GCP)Demonstrable knowledge of building MLOPs environments to a production standardExperience on mentoring, coaching and supporting less experienced colleagues and clientsExperience with SAFe agile principles and practicesWE OFFER- Private health insurance
EPAM Employees Stock Purchase Plan100 paid sick leaveReferral ProgramProfessional certificationLanguage coursesAWS,