Job Objective Design and deliver scalable real-time data and machine learning solutions by building robust ingestion and transformation frameworks across Hadoop ecosystems. Enable end-to-end ML model operationalization and performance optimization, while supporting multi-modal data processing and development of engineering tools and applications.
Responsibilities - Design and develop highly scalable, real-time systems using Hadoop ecosystem components (Iceberg, Spark, Ozone, Trino, Hive, Ranger, Kafka, Flink, and Nifi).
- Build robust data ingestion and transformation frameworks using Java, Spark, Python, and shell scripting for ingesting multi-modal data (image, audio, video, unstructured documents) with both batch and real-time.
- Develop full-stack applications and internal engineering tools using Python, shell scripting, and modern web frameworks (e.g., Flask, React).
- Collaborate closely with data scientists to operationalize...