Scope of the Role:

Innodata builds the data that physical AI depends on. This role exists because the value of that data is determined long before a model sees it, in how it is collected, structured, and labeled. You will be the bridge between our data operation and the requirements of robotics foundation models. You will determine how Innodata's existing and future data is best used for robotics, and you will guide our collection effort so that what we capture is positioned for real foundation model use cases rather than collected for its own sake. The premise of the work is that progress in physical AI is bottlenecked by data design, not model architecture, and your job is to make that thesis concrete. This is a hybrid role requiring frequent on-site presence at our Saddle Brook, NJ facility. Candidates should be located within commutable distance or willing to travel to Saddle Brook regularly.

We are hiring primarily at the mid to senior level, and we will consider very senior candidates with the right background.

What You’ll Own:

Translate the requirements of robotics foundation models, including vision-language-action models, world models, and manipulation and locomotion policies, into concrete data specifications covering modalities, action representations, sampling, annotation schemas, and evaluation criteria.
Determine how Innodata's existing and incoming data should be structured, formatted, and enriched to maximize its value for training and evaluating robot foundation models.
Guide the robotics data collection effort across capture modalities, including motion capture, egocentric, exocentric, teleoperation, multi-sensor, and synthetic generation, so that collected data maps cleanly to model needs.
Run experiments that validate data design, including fine-tuning and evaluating foundation models on Innodata data to demonstrate that specific data decisions produce measurable model improvement.
Develop evaluation and benchmarking methodology for robotics data and models, grounded in measurable criteria such as coverage, discriminative power, reliability, and actionability.
Collaborate with the capture lab, annotation teams, and the synthetic data pipeline to turn data specifications into operational collection and labeling plans.
Represent Innodata's data approach in technical conversations with research and robotics organizations.

You’ll Thrive in This Role If You Have:

A background in robot learning or robotics machine learning, covering areas such as imitation learning, reinforcement learning, manipulation, vision-language-action models, or world models.
Hands-on experience training and evaluating models for robotics, such as building manipulation or locomotion policies through imitation learning or reinforcement learning, with strong PyTorch fundamentals.
Familiarity with robotics data formats and standards, such as the LeRobot dataset format, RLDS, and Open X-Embodiment, along with common motion and sensor formats.
A data-centric mindset, with a demonstrated understanding that data design drives model performance as much as architecture does.
The ability to translate fluidly between data operations and model research, and to specify what good data looks like for a given model objective.
Strong written and verbal communication skills, since this role interfaces directly with customers and the broader robotics community, and must explain technical data and modeling decisions clearly to both expert and non-expert audiences.
Strong Python and a rigorous, reproducible approach to experiments and documentation.
Experience fine-tuning large vision-language or vision-language-action models with the modern toolchain, for example HuggingFace transformers and PEFT.
Experience with robotics simulation and synthetic data tools such as NVIDIA Isaac Sim, Isaac Lab, and Omniverse, or comparable platforms.
Experience with teleoperation or egocentric data collection.
Experience benchmarking or evaluating robot policies, including sim-to-real considerations.
Publications or open-source contributions in robot learning or physical AI.

The expected salary range for this position is $180,000 - $300,000 USD per year, based on experience, skills, and qualifications.

Robotics / Physical AI Research Engineer

Description

Stack