Manager, Data Science
We are seeking an experienced Data Science / Machine Learning Engineering Lead to join our team and drive the development of advanced ML / AI capabilities. You will lead a team of Data Scientists / ML Engineers, focusing on building and deploying cutting-edge machine learning solutions using our modern ML infrastructure including Anthropic, OpenAI, and self-hosted LLMs.
- Team Leadership Management
- Lead, mentor, and develop a team Data Scientists, Data Engineers, ML Engineers
- Conduct regular 1 : 1s, performance reviews, and career development planning
- Foster a collaborative, innovative team culture focused on continuous learning
- Coordinate work allocation and ensure timely delivery of projects
- Facilitate knowledge sharing and best practices across the team
- Technical Leadership
- Design and implement scalable ML model training pipelines using modern toolset (e.g MLflow, Comet, Langfuse, WandB, Trino, dbt, Spark, Flink, etc)
- Lead fine-tuning initiatives for both commercial (Anthropic Claude, OpenAI GPT) and open-source LLMs
- Utilise self-hosted LLM infrastructure using Ray, AIBrix, and vLLM for optimal performance and cost efficiency with Lora / QLora
- Architect and oversee model continous validation frameworks within our ecosystem
- Develop real-time anomaly detection systems leveraging for streaming data processing
- Build predictive models for system performance, usage patterns, and automation workflow optimization
- Establish ML engineering best practices for model versioning, monitoring, and deployment on Kubernetes
- Creation of eval, validation and metrics pipelines for models during training and inference
- Strategic Initiatives
- Optimize the balance between commercial APIs (Anthropic, OpenAI) and self-hosted models for different use cases
- Partner with product and engineering teams to identify high-impact ML opportunities
- Define the teams technical roadmap aligned with company objectives
- Drive adoption of state-of-the-art ML techniques and tools
- Contribute to infrastructure decisions for scaling our ML platform
- Operational Excellence
- Implement robust CI / CD pipelines for ML models in Kubernetes environments
- Monitor model performance using MLflow tracking and implement drift detection
- Manage Flink jobs for real-time feature engineering and anomaly detection
- Document processes, architectures, and decision rationale
Requirements
Qualifications / Experience / Technical Skills
Education ExperienceMasters or PhD in Computer Science, Machine Learning, Statistics, or related field10+ years of hands-on experience in data science / machine learning5+ years of experience leading technical teamsProven track record of deploying ML LLM models to production at scaleTechnical SkillsDeep expertise in Python and ML frameworks (PyTorch, TensorFlow)Extensive experience with commercial LLM APIs (Anthropic Claude, OpenAI GPT-4)Strong proficiency with MLflow for experiment tracking and model managementExperience with distributed computing using Apache SparkProficiency with Apache Flink for stream processing and real-time MLKnowledge of LLM fine-tuning techniques (LoRA, QLoRA, full fine-tuning)Expertise in anomaly detection algorithms and time series analysisLeadership SkillsDemonstrated ability to lead and inspire technical teamsStrong communication skills to translate complex technical concepts to stakeholdersExperience with agile development methodologiesTrack record of successful cross-functional collaborationAbility to balance technical excellence with business pragmatismSoft Skills / Personal Characteristics
Experience with AIBrix, vllm or similar ML platform solutionsExperience with AI code generation and anonymisation pipelinesKnowledge of advanced prompting techniques and prompt engineeringExperience building RAG (Retrieval Augmented Generation) systemsBackground in building ML platforms or infrastructureFamiliarity with vector databases (Pinecone, Weaviate, Qdrant)Experience with model security and responsible AI practicesContributions to open-source ML projectsPython, PyTorch, TensorFlow