Kairos
Back to jobs

ML Research Scientist I/II, Multimodal Data Extraction

On-site
Lila SciencesCambridge, GB / MA, US7 months agoWebsite
Physical Sciences AI

Compensation

$176,000-$304,000
Apply
Share

Description

Your Impact at LILA

As a ML Research Scientist - Multimodal Data Extraction, you will advance Lila’s vision of scientific superintelligence by developing foundation models that autonomously read, interpret, and structure scientific knowledge across text, images, and experimental data in the physical sciences. Your research will help unify the world’s scientific information into machine-understandable form, powering reasoning, prediction, and autonomous discovery across materials science and chemistry.

What You'll Be Building

  • Research and develop AI systems that extract and structure knowledge from diverse scientific sources.
  • Design and fine-tune large language, multi-modal and specialized models for factual, interpretable data extraction.
  • Build scalable pipelines for unstructured and heterogeneous scientific data, integrating text, tables, and visuals.
  • Collaborate with domain experts to align extracted data with real-world discovery workflows.
  • Publish research that advances the state of the art in multimodal understanding and AI-driven knowledge extraction.

What You’ll Need to Succeed

  • PhD (or equivalent research experience) in Computer Science, Chemistry, Materials Science, or related field.
  • Expertise in machine learningNLP, and vision–language modeling using PyTorch and Hugging Face Transformers.
  • Proven ability to train, fine-tune, and evaluate LLMs and multimodal models for scientific data extraction.
  • Strong understanding of data structures and representations used in the physical sciences.
  • Demonstrated research impact through publications, preprints, or open-source work (e.g., NeurIPS, ICLR, ICML, ACL, EMNLP, Scientific Journals).

Bonus Points For

  • Experience with multimodal fusion architectures and document-level understanding.
  • Knowledge of scientific document parsing (OCR, table extraction, figure-caption linking).
  • Familiarity with knowledge graph construction or reasoning systems for science.
  • Experience with noisy or heterogeneous real-world scientific data.
  • Collaborative mindset and passion for advancing AI in the physical sciences.

 

Stack

PyTorchTransformersLLMsHugging FaceFoundation ModelsMachine LearningNLP
Posted
Nov 3, 2025
Last seen
Jun 25, 2026
First seen
Jun 25, 2026
Status
active
ML Research Scientist I/II, Multimodal Data Extraction at Lila Sciences | Kairos