← Back to Jobs
Talenzon | london, United-Kingdom | Posted June 06, 2026
Position Overview
Location: London, UK
Work Model: On-site
Role Type: Full-Time
What You’ll Do
- Design and implement reliability strategies for high‑availability production systems
- Monitor system health, performance, and uptime across cloud infrastructure
- Build automation to reduce manual operations and improve system reliability
- Develop and maintain observability systems including logging, metrics, and tracing
- Manage incident response processes and perform root cause analysis for production issues
- Improve system resilience through capacity planning, performance optimisation, and fault tolerance
- Collaborate with engineering teams to integrate reliability practices into the software development lifecycle
- Implement infrastructure automation using Infrastructure as Code
What We’re Looking For
Required Skills & Experience
- Strong experience operating production systems i...