
Member of Technical Staff - Inference
On-site
Fresh
Infrastructure
Compensation
Salary undisclosedDescription
About the Role:
- We are building the high-performance inference platform that serves Grok to millions of users every day with lightning speed and perfect reliability.
- As a Member of Technical Staff - Inference, you will design and optimize large-scale model serving systems end-to-end. You will own everything from distributed infrastructure (global KV cache, continuous batching, load balancing, auto-scaling) to deep low-level optimizations (GPU kernels, quantization, speculative decoding, tail latency).
- This is a high-impact role where your work directly determines how fast and reliably users interact with Grok at massive scale
Responsibilities:
- Architect and implement scalable distributed infrastructure for model serving (load balancing, auto-scaling, batch scheduling, global KV cache).
- Optimize latency and throughput of model inference under real production workloads.
- Build reliable, high-concurrency serving systems that serve billions of users with 100% uptime, 0% error rate, and excellent tail latency.
- Benchmark, fine-tune, and accelerate inference engines (including low-level GPU kernel work and code generation).
- Develop custom tools to trace, replay, and fix issues across the full stack — from orchestration down to GPU kernels.
- Create robust CI/CD infrastructure for seamless endpoint deployment, image publishing, and inference engine updates.
- Accelerate research on scaling test-time compute, RL rollout, and model-hardware co-design for next-generation systems.
BASIC QUALIFICATIONS:
- Deep low-level systems programming (C/C++ or Rust)
- Experience with large-scale, high-concurrent production serving.
- Experience with GPU inference engines (vLLM, SGLang, Triton, TensorRT-LLM, etc.).
- Strong background in system optimizations: batching, caching, load balancing, parallelism.
- Low-level inference optimizations: GPU kernels, code generation.
- Algorithmic inference optimizations: quantization, speculative decoding, distillation, low-precision numerics.
- Experience with testing, benchmarking, and reliability of inference services.
- Experience designing and implementing CI/CD infrastructure for inference.
COMPENSATION AND BENEFITS:
$180,000 - $440,000 USD
Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.
Stack
C++GPULLMsvLLMCI/CDRustTriton
- Posted
- Oct 5, 2024
- Last seen
- Jun 25, 2026
- First seen
- Jun 25, 2026
- Status
- active