← Back to Jobs
Remotedxb | dubai, United-Arab-Emirates | Posted June 26, 2026
Position Overview
Responsibilities
- Contribute to system observability by implementing and improving metrics, alerting, and dashboards.
- Develop automation, tooling, and monitoring solutions to support high service availability.
- Partner with application and quality engineering teams to implement best practices in reliability and release automation.
- Drive operational excellence through proactive incident prevention, blameless postmortems, and capacity planning.
- Participate in on-call rotations to support critical services and ensure rapid response to incidents.
- Define SLIs and SLOs for core user flows to align the team on performance and availability standards.
Requirements
- Solid experience in Python for automation, tooling, and data-driven operational tasks.
- Proficiency in at least one of the following: Java, C++, or Go.
- Strong understanding of Linux systems and cloud infrastructure (AWS, GCP, or...