Kairos
Back to jobs

Sr. Engineer, Kernel Development and Optimization

On-site
TenstorrentBelgrade, RS / Serbia2 months agoWebsite
Senior
OPs

Compensation

Salary undisclosed
Apply
Share

Description

Tenstorrent is building next-generation AI compute. The Kernel Development and Optimization team develops the performance-critical kernels that unlock the full capability of our hardware across ML and HPC workloads.

This role is hybrid based out of Belgrade, Serbia.

We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.

 

Who You Are

  • A strong C++ systems engineer with experience writing performance-critical or low-level software.
  • Comfortable reasoning about concurrency, synchronization, latency hiding, and compute versus memory trade-offs.
  • Data-driven in your approach, using profiling and benchmarking results to guide optimization decisions.
  • Effective at debugging complex runtime or kernel-level issues in large codebases.
  • Structured thinker who can break down ambiguous performance problems into measurable experiments.

 

What We Need

  • Engineers who can design, implement, and optimize GPU-style kernels such as matrix multiplication, attention primitives, and data-movement operations.
  • Clear ownership of performance, from identifying bottlenecks to delivering measurable throughput improvements.
  • Contribution to host-side orchestration code and parallelization strategies.
  • Development of micro-benchmarks, regression tests, and tooling to ensure correctness and sustained performance gains.
  • Close collaboration with compiler, runtime, ML, and hardware teams to integrate kernels into production systems.

 

What You Will Learn

  • The execution model, memory architecture, and performance characteristics of Tenstorrent AI hardware.
  • How to write and optimize accelerator kernels outside traditional CUDA-first ecosystems.
  • Practical AI-assisted and agentic workflows for kernel generation, debugging, and optimization.
  • How to translate performance intuition into rigorous, reproducible engineering results.
  • How low-level kernels, compilers, runtime systems, and hardware co-evolve in modern AI platforms.

 

Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.

Stack

C++GPUAgentic AIMachine LearningCUDA
Posted
Apr 1, 2026
Last seen
Jun 25, 2026
First seen
Jun 25, 2026
Status
active
Sr. Engineer, Kernel Development and Optimization at Tenstorrent | Kairos