About Ema

Ema is building the world’s leading Agentic AI platform to transform enterprise productivity. We enable organizations to delegate repetitive tasks to Ema, the Universal AI Employee, delivering 10x gains in workforce efficiency, across functions. Founded by former executives from Google, Coinbase, Flipkart, and Okta, our team includes engineers from premier tech companies and graduates of Stanford, MIT, UC Berkeley, CMU, and IITs.

We are backed by industry leading investors including Accel, Naspers/Prosus, Section32, and angels like Sheryl Sandberg and Dustin Moskovitz. Headquartered in Silicon Valley and with offices in London, Bangalore and Vancouver and Bangalore, Ema is at the frontier of what Agentic AI can do in production — we ship real systems that run real business processes at scale.

About the Role

As a Site Reliability Engineer at Ema, you will own the stability, availability, and operational health of our agentic AI platform across customer environments. You'll work closely with Engineering and DevOps to provision infrastructure, drive deployment excellence, and keep production running at the quality bar our enterprise customers expect — 99.9%+ uptime, proactive incident response, and continuous improvement.

What You'll Do

Infrastructure & Deployment

Design and provision cloud infrastructure (GCP, Azure, AWS) tailored to customer environments, with security, scalability, and compliance built in
Execute on-call SaaS deployments with minimal downtime; automate and optimize deployment workflows end-to-end

Production Stability & Observability

Monitor logs, alerts, and metrics to maintain SLA commitments and catch issues before they escalate
Diagnose and resolve production incidents with speed and rigor; drive root cause analysis and permanent fixes
Collaborate with DevOps to enhance monitoring dashboards and alerting frameworks; deliver clear system health reporting to internal and customer stakeholders

Documentation & Knowledge Management

Maintain deployment runbooks, troubleshooting guides, and environment configuration docs
Facilitate knowledge transfer across teams to ensure smooth handovers

What We're Looking For

4+ years in DevOps, Infrastructure, or Deployment Engineering roles
Hands-on experience with cloud platforms — GCP, Azure, or AWS
Proficiency with infrastructure-as-code tools such as Terraform or Ansible
Experience with CI/CD pipelines — GitLab CI, Jenkins, or equivalent
Familiarity with observability tooling — Prometheus, Grafana, Datadog, or Splunk
Strong troubleshooting instincts across distributed systems
Clear communicator — comfortable working directly with both engineering teams and customer stakeholders

Nice to Have

Experience at a fast-paced, high-growth startup
Hands-on with containerization and orchestration — Docker, Kubernetes

Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for variable compensation, equity, and benefits.

Ema Unlimited is an equal opportunity employer and is committed to providing equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, sexual orientation, gender identity, or genetics.

Site Reliability Engineer

Description

About Ema

Stack