What You’ll Do
As a Lead MLOps Engineer you’ll own the design and evolution of our ML infrastructure enabling fast, reliable and secure experimentation, deployment and monitoring of AI agents and LLMs in production. You’ll guide a small but high‑impact team of DevOps and ML engineers ensuring our platform achieves best‑in‑class reliability, scalability and velocity.
- Architect and evolve InteractiveAI’s ML infrastructure from data ingestion to model serving and continuous learning loops
- Design and implement scalable, cloud‑agnostic runtimes (Kubernetes / GPU clusters) across on‑prem VPC and hybrid deployments
- Build automation for end‑to‑end ML pipelines (data, fine‑tuning, evaluation, deployment)
- Establish gold standards for reproducibility, observability and model governance
- Partner with AI Engineers to optimize training / inference performance and cost
- Build internal tooling to accelerate AI product delivery and reduce time‑to‑deploy
- Implement robust monitoring, logging and alerting frameworks for ML workloads
- Drive adoption of CI / CD best practices for ML and infrastructure code
- Mentor and grow a small team of MLOps engineers fostering technical excellence and ownership
What We’re Looking For
We’re seeking a hands‑on technical leader who combines deep MLOps expertise with a builders mindset, someone who thrives in fast‑moving environments and can scale both systems and teams.
Minimum Requirements
5 years of experience in DevOps, MLOps or Infrastructure Engineering rolesProven track record deploying and maintaining ML workloads in productionStrong expertise in containerization and orchestration (Docker, Kubernetes)Experience building CI / CD pipelines for ML models and infrastructureProficiency with infrastructure‑as‑code tools (Terraform, Pulumi, CloudFormation)Strong coding / scripting skills (Python, Bash or similar)Experience with monitoring and observability tools (Prometheus, Grafana, ELK, etc.)Experience with at least one major cloud provider (AWS, GCP or Azure)Strong understanding of ML lifecycle management (training, evaluation, deployment, monitoring)Additional Requirements
Experience with MLflow, Weights & Biases or other model‑tracking systemsUnderstanding of fine‑tuning workflows (LoRA, QLoRA, PEFT) and LLM servingExposure to RAG systems, vector databases and large‑model inference optimizationExperience implementing security and compliance practices (GDPR, ISO 27001, etc.)Prior experience leading technical teams or mentoring engineersFamiliarity with distributed training and GPU cluster management is a plusWhat You’ll Get
Competitive base salary (from 60,000 / yr to 100,000 / yr) + performance bonusesFuture equity opportunity for high performersHealth & wellness allowancesPrivate health insuranceFlexible work setup – travel when needed (ideally Hybrid in Lisbon or Madrid)25 days of holidays / paid time off (excluding local public holidays)Who You Are
Proactive & strategic – you anticipate system and organizational needs, designing scalable and future‑proof solutions.Technical leader – you raise the bar for engineering excellence and help others do their best work.Accountable & high ownership – you take full responsibility for uptime, performance and delivery.Builder mentality – you’re comfortable with ambiguity, moving fast while maintaining reliability.Collaborative partner – you communicate clearly, build trust across teams and balance pragmatism with long‑term vision.Interview Process
We keep our process focused and respectful of your time. Most candidates complete it in 23 weeks. Here’s what to expect :
Intro Call – 30 minutes with our team to align on fit and expectationsTechnical Challenge – a practical MLOps design or automation taskTechnical Interview – deep dive into systems architecture, automation and ML infrastructureLeadership & Values Interview – assess alignment with InteractiveAI culture and growth mindsetOffer – final conversation and offerAbout Us
InteractiveAI is a fast‑growing startup on a mission to empower enterprises with fully managed AI agent lifecycles.
We are building the next generation of enterprise‑AI solutions delivering an end‑to‑end Agentic IDE alongside an extensible ecosystem of agentic resources and solutions.
Our platform allows companies to orchestrate, monitor, evaluate, deploy and improve AI agents and soon fine‑tune and own their own models.
We value autonomy, speed and innovation and were building a world‑class team to match. Our squads are lean, focused and execution‑driven.
If you thrive in high‑performance environments and want to be part of a company that rewards transformational outcomes, this is for you.
#J-18808-Ljbffr