ABOUT THE ROLE:

You will work on the most critical post-training and reinforcement learning challenges at any given time — including reward modeling, preference optimization (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities.
You will get clarity on your first project before an offer.

BASIC QUALIFICATIONS:

You believe truth-seeking AI is the most important and challenging problem.
You are obsessed about building incredibly useful models through post-training and RL techniques.
You are a power user of AI models and eager to push the boundaries of what’s possible with reinforcement learning and alignment methods.
If you previously worked on post-training, RLHF, or trained models used by millions of people it’s a big plus, but relevant experience is not required.
You take pride in your work and thrive in meritocratic environments.

COMPENSATION AND BENEFITS:

$180,000 - $600,000 USD

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

Member of Technical Staff - Post-Training and RL

Description

ABOUT THE ROLE:

BASIC QUALIFICATIONS:

COMPENSATION AND BENEFITS:

Stack