David Heineman

Hey! I'm David 👋

I'm a pre-doctoral young investigator at the Allen Institute for AI, working on language model pretraining, data and evaluation.


About Me

I study foundation models with a focus on methods: how we make decisions about data [1], draw conclusions from experiments [2], and how model behavior changes at scale [3]. I've developed new, cheap and efficient tools for measuring model behavior [4, 5], and built receipes for fully-open language models [6, 7]. Recently, I've been thinking about how methods from pretraining can help us build vision-language, speech-language and reasoning models (interested? please reach out!).

I work on these problems at Ai2 as part of the Open Language Model (OLMo) project, advised by Kyle Lo and Jesse Dodge. Before that, I was an undergrad at Georgia Tech 🐝, fortunate to be advised by Prof. Wei Xu and work with Yao Dou and Mounica Maddela. I've also spent a few summers at AWS and a healthcare startup Patientco. I enjoy reading, hiking, and making homebrew nitrogen cold brew.


Publications & Preprints [selected, all]

Olmo Hybrid: From Theory to Practice [models]

William Merrill*, Yanhong Li*, Tyler Romero*, Anej Svete*, Caia Costello*, ..., David Heineman, ..., Noah A. Smith, Hannaneh Hajishirzi, Ashish Sabharwal
preprint, 2026

Olmix: A Framework for Data Mixing Throughout LM Development [blog, code, mix weights]

Mayee F. Chen, Tyler Murray, David Heineman, Matt Jordan, Hannaneh Hajishirzi, Christopher Ré, Luca Soldaini, Kyle Lo
preprint, 2026

Olmo 3 [blog, code, models, data]

Olmo Team (incl. David Heineman)
technical report, 2025

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces [code, leaderboard]

Mike A. Merrill*, Alexander G. Shaw*, Nicholas Carlini, Boxuan Li, Harsh Raj, Ivan Bercovich, Lin Shi, Jeong Yeon Shin, Thomas Walshe, ... David Heineman, ..., Jesse Hu, Christopher Michael Rytting, Ryan Marten, Yixin Wang, Alex Dimakis, Andy Konwinski, Ludwig Schmidt
ICLR, 2026

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation [code, data]

David Heineman, Valentin Hofmann, Ian Magnusson, Yuling Gu, Noah A. Smith, Hannaneh Hajishirzi, Kyle Lo, Jesse Dodge
NeurIPS, 2025 (Spotlight, Top 5%)

Fluid Language Model Benchmarking [code, models]

Valentin Hofmann, David Heineman, Ian Magnusson, Kyle Lo, Jesse Dodge, Maarten Sap, Pang Wei Koh, Chun Wang, Hannaneh Hajishirzi, Noah A. Smith
COLM, 2025 (Oral, Top 5%)

Establishing Task Scaling Laws via Compute-Efficient Model Ladders [code]

Akshita Bhagia*, Jiacheng Liu*, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi
COLM, 2025

2 OLMo 2 Furious [blog, code, models, data]

Pete Walsh*, Luca Soldaini*, Dirk Groeneveld*, Kyle Lo*, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, ..., David Heineman, ..., Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi
COLM, 2025

Evaluating LLMs on Chinese Idiom Translation

Cai Yang, Yao Dou, David Heineman, Xiaofeng Wu, Wei Xu
COLM, 2025

DataDecide: How to Predict Best Pretraining Data with Small Experiments [code, models]

Ian Magnusson*, Nguyen Tai*, Ben Bogin*, David Heineman, Jena D. Hwang, Luca Soldaini, Akshita Bhagia, Jiacheng Liu, Dirk Groeneveld, Oyvind Tafjord, Noah A. Smith, Pang Wei Koh, Jesse Dodge
ICML, 2025

Improving Minimum Bayes Risk Decoding with Multi-Prompt [code]

David Heineman, Yao Dou, Wei Xu
EMNLP, 2024

Towards a Path Dependent Account of Category Fluency [code]

David Heineman, Reba Koenen, Sashank Varma
CogSci, 2024

Edit-level Simplification Evaluation using SALSA 💃 [code/data, metric]

David Heineman, Yao Dou, Mounica Maddela, Wei Xu
EMNLP, 2023

LENS: A Learnable Evaluation Metric for Text Simplification [code/data, metric]

Mounica Maddela*, Yao Dou*, David Heineman, Wei Xu
ACL, 2023

* = equal contribution


Some past work

Recommendations

A few interesting corners of the internet I find worth checking out!

... to flip through


Games, Puzzles, and Computation by Erik Demaine

The Corrections by Jonathan Franzen

Naked Statistics by Charles Wheelan

Society Must be Defended by Michel Foucault

Oblivion by David Foster Wallace


I also enjoy trying new coffee shops. Here's some recommendations across Atlanta, that I visited during my undergrad, and a growing list across Seattle.