Kairos
Back to jobs

Software Engineer (Training Platform)

On-site
Isomorphic LabsLondon, GB3 months agoWebsite
Tech

Compensation

Salary undisclosed
Apply
Share

Description

Your impact 

This is a great opportunity to play a defining role in building and operating our inference platform. You will connect deep expertise in distributed systems programming with hands-on knowledge of operating Kubernetes-based platforms according to best practices, all while supporting the platform in production. You will be instrumental in serving cutting-edge machine learning models at massive scale for scientific applications, often requiring you to think from first principles. Success in this role demands excellent technical skills, independence, strong ownership, and a relentless user-focus.

What you will do 

  • Contribute to the development and operation of the inference platform, serving fleets of cutting-edge machine learning models to scientific applications.
  • Deliver high-quality and well-tested user-focused features.
  • Provide support to users of the platform.
  • Perform maintenance work and drive internal tech investments for platform stability, reliability and scalability.
  • Build observability and alerting mechanisms for the platform.
  • Improve the Continuous Integration/Continuous Deployment (CICD) setup of the platform.
  • Operate effectively in a fast-paced and ambiguous environment, ensuring independent delivery.
  • Provide great documentation and guidance for other contributors and users.

Skills and qualifications 

Essential:

  • Experience writing and maintaining Python code in production environments, with an emphasis on concurrent programming (with a strong knowledge of async, threads, processes, GIL, etc).
  • Experience building, maintaining and operating Kubernetes services.
  • Experience working with distributed systems.
  • Experience maintaining APIs that serve a moderately large set of internal users.
  • Experience working with ML models; an understanding of ML lifecycle and how serving and operating ML models differs from other kinds of workloads.

Nice to have:

  • Experience working on an inference platform.
  • Experience managing a fleet of ML models.
  • Experience building and maintaining CI/CD processes for complex systems.
  • Experience with GCP or other comparable clouds.
  • Experience with building internal and user-focused dashboards.

Stack

PythonDistributed SystemsGCPCI/CDKubernetesMachine Learning
Posted
Mar 4, 2026
Last seen
Jun 25, 2026
First seen
Jun 25, 2026
Status
active
Software Engineer (Training Platform) at Isomorphic Labs | Kairos