Kairos
Back to jobs

Performance Engineer

On-site
AnthropicSan Francisco, CA, US / Seattle, WA, US2 years agoWebsite
AI Research & Engineering

Compensation

$280,000-$850,000
Apply
Share

Description

About the role:

Running machine learning (ML) algorithms at our scale often requires solving novel systems problems. As a Performance Engineer, you'll be responsible for identifying these problems, and then developing systems that optimize the throughput and robustness of our largest distributed systems. Strong candidates here will have a track record of solving large-scale systems problems and will be excited to grow to become an expert in ML also.

You may be a good fit if you:

  • Have significant software engineering or machine learning experience, particularly at supercomputing scale
  • Are results-oriented, with a bias towards flexibility and impact
  • Pick up slack, even if it goes outside your job description
  • Enjoy pair programming (we love to pair!)
  • Want to learn more about machine learning research
  • Care about the societal impacts of your work

Strong candidates may also have experience with: 

  • High performance, large-scale ML systems
  • GPU/Accelerator programming
  • ML framework internals
  • OS internals
  • Language modeling with transformers

Representative projects:

  • Implement low-latency high-throughput sampling for large language models
  • Implement GPU kernels to adapt our models to low-precision inference
  • Write a custom load-balancing algorithm to optimize serving efficiency
  • Build quantitative models of system performance
  • Design and implement a fault-tolerant distributed system running with a complex network topology
  • Debug kernel-level network latency spikes in a containerized environment

Deadline to apply: None. Applications will be reviewed on a rolling basis. 

Stack

TransformersGPULLMsDistributed SystemsMachine Learning
Posted
Apr 22, 2024
Last seen
Jun 25, 2026
First seen
Jun 25, 2026
Status
active
Performance Engineer at Anthropic | Kairos