Kairos
Back to jobs

Senior Technical Program Manager – AI Infrastructure, Site Operations

On-site
CerebrasSunnyvale, CA, US6 months agoWebsite
Senior
Deployment

Compensation

Salary undisclosed
Apply
Share

Description

About The Role

This Sr. TPM role owns site and data center operations programs supporting Cerebras’ AI Cloud and customer deployments. The position sits at Sunnyvale HQ and works closely with Hardware Engineering, Inference Engineering, and Operations leadership to ensure Cerebras systems are reliably deployed, operated, and scaled.

This is a highly technical, execution-focused TPM role with strong emphasis on operational readiness, cross-functional coordination, and metrics/KPIs.

Responsibilities 

  • Own end-to-end technical programs for data center and site operations
  • Act as single-threaded owner across:
    • Hardware & Systems Engineering
    • AI Cloud Infrastructure & Operations
    • Network & Storage Engineering
    • Facilities, power, cooling, and colo partners
  • Drive site readiness for Cerebras Wafer-Scale Engine systems
  • Partner on installation, commissioning, change management, and break/fix workflows
  • Lead incident reviews and postmortems; ensure corrective actions are closed
  • Define and own operational metrics and KPIs, including:
    • Availability and reliability
    • Incident rate, severity, MTTR / MTTD
    • Deployment readiness and time-to-service
    • Capacity and operational risk
  • Build executive-level dashboards and reporting
  • Establish program governance, risk tracking, and RACI clarity
  • Present program status, metrics, and operational risks to senior leadership

Required Background 

  • 8+ years in Technical Program Management, Infrastructure Ops, or Data Center Ops
  • Experience leading large, cross-functional infrastructure programs
  • Strong understanding of:
    • Data center power and cooling fundamentals
    • Network and storage basics
    • Hardware-centric platforms
  • Proven ability to define and operationalize metrics
  • Strong written and executive-level communication skills

Preferred Experience 

  • AI/ML, HPC, or accelerator-based infrastructure
  • High-density and/or liquid-cooled data centers
  • Working with colocation providers and facilities teams
  • Incident management, reliability, or service operations background

Stack

Machine Learning
Posted
Dec 16, 2025
Last seen
Jun 25, 2026
First seen
Jun 25, 2026
Status
active
Senior Technical Program Manager – AI Infrastructure, Site Operations at Cerebras | Kairos