Kairos
Back to jobs

Member of Technical Staff - Inference

On-site
xAIPalo Alto, CA, US1 year agoWebsite
Fresh
Infrastructure

Compensation

Salary undisclosed
Apply
Share

Description

About the Role:

  • We are building the high-performance inference platform that serves Grok to millions of users every day with lightning speed and perfect reliability.
  • As a Member of Technical Staff - Inference, you will design and optimize large-scale model serving systems end-to-end. You will own everything from distributed infrastructure (global KV cache, continuous batching, load balancing, auto-scaling) to deep low-level optimizations (GPU kernels, quantization, speculative decoding, tail latency).
  • This is a high-impact role where your work directly determines how fast and reliably users interact with Grok at massive scale

Responsibilities: 

  • Architect and implement scalable distributed infrastructure for model serving (load balancing, auto-scaling, batch scheduling, global KV cache).
  • Optimize latency and throughput of model inference under real production workloads.
  • Build reliable, high-concurrency serving systems that serve billions of users with 100% uptime, 0% error rate, and excellent tail latency.
  • Benchmark, fine-tune, and accelerate inference engines (including low-level GPU kernel work and code generation).
  • Develop custom tools to trace, replay, and fix issues across the full stack — from orchestration down to GPU kernels.
  • Create robust CI/CD infrastructure for seamless endpoint deployment, image publishing, and inference engine updates.
  • Accelerate research on scaling test-time compute, RL rollout, and model-hardware co-design for next-generation systems.

BASIC QUALIFICATIONS:

  • Deep low-level systems programming (C/C++ or Rust)
  • Experience with large-scale, high-concurrent production serving.
  • Experience with GPU inference engines (vLLM, SGLang, Triton, TensorRT-LLM, etc.).
  • Strong background in system optimizations: batching, caching, load balancing, parallelism.
  • Low-level inference optimizations: GPU kernels, code generation.
  • Algorithmic inference optimizations: quantization, speculative decoding, distillation, low-precision numerics.
  • Experience with testing, benchmarking, and reliability of inference services.
  • Experience designing and implementing CI/CD infrastructure for inference.

COMPENSATION AND BENEFITS:

$180,000 - $440,000 USD

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

Stack

C++GPULLMsvLLMCI/CDRustTriton
Posted
Oct 5, 2024
Last seen
Jun 25, 2026
First seen
Jun 25, 2026
Status
active