Britain's Premier Job Portal
About the Role
This position will resolve incidents and collate data in support of root cause analysis and systems design
Key Responsibilities:
Monitoring & Observability: Create and optimize monitoring queries; establish service level baselines.
Incident Response: Support senior engineers during incidents; contribute to post-incident reviews.
Disaster Recovery: Participate in and help execute disaster recovery tests.
Automation & Infrastructure as Code: Implement automation and execute code in production environments.
Documentation: Contribute to SRE knowledge bases and documentation.
Collaboration: Work with cross-functional teams including Development, QA, IT Operations, and Product SRE.
Required Skills & Tools
Programming & Scripting: Python, Bash scripting, Java, Angular<...