Senior Site Reliability Engineer – Observability Focus (Contract)
Interviews are taking place early next week. We’re looking for an experienced Site Reliability Engineer to join a growing platform team on a 3+ month contract (extensions likely). The role is focused on enhancing observability, improving alerting, and streamlining incident response in a live production environment.
This is a hands-on role working with a cloud-native stack, where you’ll combine platform engineering with reliability and monitoring expertise.
What You’ll Do
- Develop and refine monitoring dashboards and alert configurations to cut noise and surface actionable issues.
- Build custom metrics and observability tooling using Datadog .
- Create and maintain incident response playbooks and participate in improving the incident management process.
- Collaborate with engineers to boost visibility of distributed systems and reduce operational load.
- Contribute to cloud infrastructure improvements alongside platform engineers.
Tech Environment
Monitoring / Observability – Datadog (dashboards, custom metrics, alert optimisation)Deployment – GitOps (ArgoCD), CI / CD pipelines (GitHub Actions)Networking / Security – API Gateway (Kong or equivalents), Cloudflare (DNS, CDN, WAF)What We’re Looking For
Strong Datadog skills in production environments.Proven SRE background – focused on reliability, uptime, and incident readiness.Solid GCP experience with Kubernetes and related services.Proactive, self-sufficient, and comfortable working in less-structured setups.Seniority level
Mid-Senior levelEmployment type
ContractJob function
Engineering and Information TechnologyIndustries
IT Services and IT Consulting#J-18808-Ljbffr