DevOps & ML Ops Engineer would be responsible for developing and maintaining scalable, stable services that deliver machine learning models to end users with guaranteed uptime. The primary focus will be on the infrastructure, deployment, and continuous integration / continuous delivery (CI / CD) processes for our ML services.
Maintain VM environments and manage OS updates, keep up-to-date VM inventory
Troubleshooting and provide solutions for system configurations
Plan, execute and test disaster recovery
Monitor and examine all application, performance, event, and system logs to assist in troubleshooting
Implement and manage the CI / CD pipelines to ensure seamless and efficient deployment of ML models.
Collaborate with data scientists, ML researchers, and language experts to understand the requirements for deploying ML models and provide necessary infrastructure support.
Automate and streamline the build, test, and deployment processes to enhance efficiency and reduce time-to-market.
Monitor and optimize the performance, availability, and scalability of production ML systems.
Develop and maintain robust monitoring, logging, and alerting systems to proactively identify and address issues.
Implement security best practices to protect sensitive data and ensure compliance with relevant regulations.
Stay up-to-date with industry trends and emerging technologies related to ML Ops and DevOps, and propose innovative solutions to improve our ML service delivery.
Strong knowledge of cloud platforms (such as AWS, Azure, or GCP) and local cluster deployments, and experience in deploying and managing ML services on these platforms.
Spark) and big data technologies (e.g., Hadoop, Kafka).
Proficiency in Python, Shell, Ruby, Golang, or C++ and experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation).
Docker) and orchestration frameworks (e.g. Kubernetes).
Jenkins, GitLab CI / CD) and version control systems (e.g., Solid understanding of networking, security, and system administration concepts.
Bachelor's or higher degree in Computer Science, Engineering, or a related field.
Proven experience as an ML Ops Engineer, DevOps Engineer, or a similar role, with a focus on deploying and maintaining machine learning models in production environments.
Experience with machine learning frameworks and libraries, such as TensorFlow, PyTorch, or scikit-learn.
Familiarity with serverless computing and event-driven architectures.
Experience with logging and monitoring tools (e.g., ELK Stack, Prometheus, Grafana).
Understanding of software development methodologies and agile practices
Senior Engineer • Barcelona, Provincia de Barcelona, España