Job Title :
- Azure Machine Learning Infrastructure Specialist
About the Role :
This position focuses on designing, building, and maintaining scalable machine learning (ML) infrastructure in production environments. We are looking for a skilled professional with expertise in cloud platforms, distributed computing frameworks, and containerization technologies to join our team.
Key Responsibilities :
Develop and maintain infrastructure required for deploying and scaling ML services.Implement and manage CI / CD pipelines for seamless deployment of ML models.Collaborate with data scientists, ML researchers, and language experts to understand requirements for deploying ML models and provide necessary infrastructure support.Automate and streamline build, test, and deployment processes to enhance efficiency and reduce time-to-market.Monitor and optimize performance, availability, and scalability of production ML systems.Implement security best practices to protect sensitive data and ensure compliance with relevant regulations.Requirements :
Strong knowledge of cloud platforms (such as AWS, Azure, or GCP) and local cluster deployments, and experience in deploying and managing ML services on these platforms.Knowledge of distributed computing frameworks (e.g., Spark) and big data technologies (e.g., Hadoop, Kafka).Proficiency in Python, Shell, Ruby, Golang, or C++ and experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation).Hands-on experience with containerization technologies (e.g., Docker) and orchestration frameworks (e.g. Kubernetes).Familiarity with CI / CD tools (e.g., Jenkins, GitLab CI / CD) and version control systems (e.g., Git).Solid understanding of networking, security, and system administration concepts.Strong problem-solving and troubleshooting skills, with the ability to quickly analyze and resolve issues in complex ML systems.Excellent communication and collaboration skills, with the ability to work effectively in a team-oriented environment.Bachelor's or higher degree in Computer Science, Engineering, or a related field.Desired Skills and Experience :
Experience with machine learning frameworks and libraries, such as TensorFlow, PyTorch, or scikit-learn.Familiarity with serverless computing and event-driven architectures.Experience with logging and monitoring tools (e.g., ELK Stack, Prometheus, Grafana).