Scope of the Role:

Innodata produces the voice and audio datasets behind the world's leading speech AI. We're hiring an Audio Engineer to own the technical heart of that work: the signal chain, the post-processing recipe, the technical specifications, and the consistent acoustic quality — the “sound signature” — that defines an Innodata dataset.

You'll ensure every hour of audio we deliver, across dozens of languages and recording conditions, meets a precise and consistent technical bar. You'll design the recipe, build the validation, and continuously push the quality and efficiency of how we capture and process audio.

What You’ll Own:

Own the end-to-end audio signal chain and post-processing pipeline for all collection programs.
Define and document technical specifications: sample rates, bit depth, formats, loudness (LUFS) targets, noise floors, channel configurations.
Design and maintain the “Innodata sound signature” — a consistent, spec-compliant acoustic profile across studio, remote, real-world, and telephonic captures
Build technical QA: automated and manual checks that validate audio against spec before delivery.
Specify and validate recording setups for vendors and remote contributors (signal-chain testing in a small in-house studio).
Partner with the Solutions Architect to translate customer acoustic requirements into achievable technical recipes.
Drive tooling: help select and configure recording/QA/processing tools; automate where possible.
Troubleshoot acoustic and signal issues across diverse capture environments.

You’ll Thrive in This Role If You Have:

Strong audio engineering background: signal chain, recording, post-processing, mastering, and acoustic QA.
Deep fluency in audio technical specs (sample rate, bit depth, LUFS, formats, codecs) and the ability to define and enforce them.
Experience producing consistent audio quality across varied recording conditions and locations.
Comfort with audio tooling and automation (scripting for batch processing/QA is a strong plus).
Precision and process orientation — you care about consistency at scale, not just one great recording.
Experience with speech/voice data for AI/ML (TTS or ASR datasets).
Familiarity with multilingual recording and remote/distributed capture.
Knowledge of speech quality metrics and how acoustic choices affect downstream model performance.
Scripting (Python) for audio processing pipelines (e.g., ffmpeg, sox, pydub, librosa).

The expected salary range for this position is $120,000 – $160,000 USD per year, based on experience, skills, and qualifications.

Audio Engineer

Description

Stack