Position Overview
NVIDIA is recruiting a Senior Inference Engineer to advance AIConfigurator ( https://github.com/ai-dynamo/aiconfigurator ), a system that automatically discovers high-performance deployment configurations for large-scale LLM inference. This role integrates GPU systems, model serving, performance modeling, and production software engineering. The work directly aids users in deploying models on NVIDIA platforms by optimizing efficiency, latency, parallelism, and resource utilization across both aggregated and disaggregated serving architectures. The team partners closely with Dynamo, TensorRT-LLM, vLLM, SGLang, benchmarking, and platform teams to translate sophisticated performance data into useful deployment mentorship. This is a high-impact IC role for someone who enjoys owning deep technical systems and making them practical for real developers and customers.
What you'll be doing:
+ Build and evolve AIConfigurator's core optimization engine for LLM serving, includin...