What You'll Do
As a DevOps / MLOps Engineer, you'll help architect, build, and scale the infrastructure that powers our agentic environment in production. You'll work alongside engineers, data scientists, product managers, and delivery leads to enable continuous infrastructure deployment, robust monitoring, and fast experimentation.
InteractiveAI runs on two high-performance engines : Product Teams that craft and scale our Agentic IDE, and Implementation Squads that ship high-impact, domain-specific AI solutions. Depending on your craft and ambition, you'll join the team where you can create outsized value and have a transparent, performance-based path to growth and rewards.
Design & scale multi-tenant, cloud-agnostic runtimes (Kubernetes / GPU clusters) supporting on-prem VPC and hybrid installs.
Automate end-to-end ML pipelines : data ingestion, fine-tuning (LoRA / QLoRA), evaluation, and secure rollout through robust CI / CD.
Partner with product engineers and client performance squads to ship custom agents from sandbox (5 days) to production (4-6 weeks) on tight SLAs.
Automate infrastructure using Terraform, Ansible, or similar tools.
Implement and manage containerized workloads (Docker, Kubernetes, etc.).
Ensure security compliance and data governance standards are met.
Troubleshoot production incidents and proactively improve system reliability.
What We're Looking For
We're seeking someone capable of building and scaling a robust infrastructure for our agentic environment and its ecosystem of solutions, with strong fundamentals, clean execution, and operational maturity.
Minimum Requirements :
3+ years in DevOps, Site Reliability, or Infrastructure Engineering roles.
3+ years deploying and managing AI / ML production workloads on major public clouds (e.g., AWS, GCP, Azure).
Experience deploying resilient, distributed cloud solutions at scale.
Proficiency in containerization and orchestration (Docker, Kubernetes).
Experience building and managing CI / CD pipelines.
Familiarity with infrastructure-as-code tools (Terraform, CloudFormation, Pulumi).
Strong scripting skills (Python, Bash, or similar).
Experience with monitoring and logging stacks (e.g., Prometheus, Grafana, ELK).
Excellent communication and collaboration skills across cross-functional teams.
Additional Requirements :
Experience deploying ML / AI workloads in production.
Familiarity with model versioning, tracking, and reproducibility tools (e.g., MLflow, Weights & Biases).
Experience implementing security practices in DevOps pipelines.
Knowledge of GDPR, ISO 27001, or other regulatory / compliance frameworks.
Previous work in regulated or enterprise environments is a plus.
What You'll Get
Competitive base salary (up to $100,000 / year) plus performance bonuses.
Future equity opportunities for high performers.
Health & wellness allowances.
Flexible work setup, travel when needed (ideally hybrid in Lisbon or Madrid).
Private health insurance.
25 days of paid holidays / time off (excluding local public holidays).
Who You Are
Proactive and resourceful : you take initiative to identify gaps and drive solutions.
Accountable and high-ownership : you treat our infrastructure as your own and honor commitments.
Entrepreneurial mindset : you thrive in ambiguity, embrace rapid change, and deliver in a fast-paced environment.
Team player : you collaborate effectively, give and receive feedback constructively, and mentor others.
Interview Process
Our process is focused and respects your time, typically completed in 2-3 weeks. It includes :
Intro call (30 mins) to align on fit and expectations.
Take-home challenge based on real-world problems.
Technical interview focusing on the challenge, experience, and skills.
Cultural and values interview to discuss motivation and fit.
Final offer discussion.
We're building a team of impactful, quality-driven builders. If that resonates with you, let's talk.
About us
InteractiveAI is a fast-growing startup on a mission to empower enterprises with fully managed AI agent lifecycles.
We are creating the next-generation enterprise-AI platform, including an extensible ecosystem of agentic resources and solutions.
Our platform enables companies to orchestrate, monitor, evaluate, deploy, and improve AI agents, with upcoming features for fine-tuning and owning their models.
We value autonomy, speed, and innovation, and are assembling a world-class, execution-driven team.
If you thrive in high-performance environments and aim for transformational outcomes, this is your opportunity.
#J-18808-Ljbffr
Engineer • Madrid, Madrid, SPAIN