
Staff Software Engineer, ML Acceleration
Remote
Staff / Principal
Autonomy
Compensation
Salary undisclosedDescription
About the Role:
The ML Training Acceleration team is dedicated to increasing Stack AV's product development velocity by accelerating machine learning iterations. Our core mission is to deliver a training system that is reliable, scalable, user-friendly and observable. This involves profiling, optimizing, and fine-tuning our ML models, as well as evangelizing best practices and frameworks among Machine Learning Engineers (MLEs) across the company.
Responsibilities:
- Analyze ML models to identify and resolve performance bottlenecks.
- Incorporate OSS tools to enable ML engineers self-sufficiently profile and optimize models.
- Deliver solutions to streamline model deployment across various hardware platforms.
- Collaborate with ML researchers to balance model accuracy and speed.
- Implement optimizations using CUDA, Triton, and custom kernels.
- Promote Engineering Excellence: Maintain a high bar for engineering excellence in their own work but also set a culture of engineering excellence within the team.
Qualifications:
- Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
- Experience: 5+ years of experience (including experience with GPU programming and optimization)
- Technical Skills:
- Strong programming skills in C++ and Python
- Proven experience in GPU programming and optimization
- Familiarity with deep learning frameworks, especially PyTorch
- CUDA programming
- Triton language for GPU kernels
- PyTorch optimization techniques
- TensorRT implementation
- ONNX model conversion and deployment
- Custom GPU kernel development
- Deep understanding of GPU architectures and performance optimization
- Problem-Solving: Strong analytical and problem-solving skills
- Communication: Excellent verbal and written communication skills, with the ability to convey complex technical concepts to non-technical stakeholders
- Autonomous vehicles (AV) experience is a bonus
Stack
C++PythonPyTorchGPUAutonomous VehiclesMachine LearningFine-tuningCUDATritonDeep Learning
- Posted
- Apr 10, 2026
- Last seen
- Jun 25, 2026
- First seen
- Jun 25, 2026
- Status
- active