Position Overview
Main Duties and Responsibilities:
You’ll shape our cloud-native architecture across AWS and Kubernetes, driving best-in-class infrastructure-as-code, CI/CD, and automation to accelerate model development and deployment. You’ll design and operate Kubeflow (or similar) pipelines and supporting services, streamline developer workflows, and raise the reliability and efficiency of the platform. Day to day, you’ll partner with product engineering, data scientists, and imaging teams to translate scientific and product needs into secure, production-grade infrastructure that scales.
● Design, deploy, and maintain Kubeflow (or equivalent) for pipeline orchestration, model training, evaluation, and serving on large image datasets; ensure reliability, security, and cost efficiency.
● Manage and tune Kubernetes clusters (EKS/GKE/AKS), set up namespaces, RBAC, autoscaling, network policies, and service meshes where appropriate; keep upgrades and operations predictable.
● Define infrastru...