publications | Thomas A. Buckley

★ denotes a first-authored paper. * denotes co-first authorship.

2026

Science

Performance of a Large Language Model on the Reasoning Tasks of a Physician

Peter G. Brodeur^*, Thomas A. Buckley^*, Zahir Kanjee, Ethan Goh, Evelyn Bin Ling, Priyank Jain, Stephanie Cabral, Raja-Elie Abdulnour, Adrian D. Haimovich, Jason A. Freed, Andrew Olson, Daniel J. Morgan, Jason Hom, Robert Gallo, Liam G. McCoy, Haadi Mombini, Christopher Lucas, Misha Fotoohi, Matthew Gwiazdon, Daniele Restifo, Daniel Restrepo, Eric Horvitz, Jonathan Chen, Arjun K. Manrai, and Adam Rodman

Across six clinical reasoning tasks — including an experiment using real cases from the Beth Israel Deaconess Medical Center emergency department — a reasoning model (OpenAI o1) matched or exceeded a large panel of attending physicians. The result suggests that LLMs are saturating current clinical-reasoning benchmarks, motivating the need for prospective trials.

Science, Apr 2026

HTML
NEJM

Reply: Case 28-2025: A Man with Abdominal Pain, Fever, and Hypoxemia

Thomas A. Buckley, Gurpreet Dhaliwal, and Arjun K. Manrai

New England Journal of Medicine, Mar 2026

HTML
NEJM AI

The Missing Dimension in Clinical AI: Making Hidden Values Visible

Carey Goldberg, Ran D. Balicer, Mamatha Bhat, David Blumenthal, Rebecca W. Brendel, Elena Brondolo, John S. Brownstein, Thomas A. Buckley, Carol H. Cain, Payal Chandak, Frank Chessa, Aneesh Chopra, Noa Dagan, Kenneth S. Ehlert, Barbara J. Evans, Robert Freeman, Benjamin S. Glicksberg, William Gordon, Matthias I. Gröschel, Sara Hoffman, Edward M. Hundert, Sonny Hyare, Shreya Johri, Joshua Joseph, Mary Klote, Adam B. Landman, Vivian S. Lee, Joshua C. Mandel, Kenneth D. Mandl, Arjun K. Manrai, Matthew Might, Girish N. Nadkarni, Daniel J. Nigrin, Ayush Noori, Gilbert S. Omenn, Enea Parimbelli, Andrew L. Rosenberg, David Stutz, Melanie Tory, Eugene Tunik, Susan M. Wolf, Marinka Zitnik, and Isaac Kohane

NEJM AI, Jan 2026

HTML

2025

arXiv

Navigating Gigapixel Pathology Images with Large Multimodal Models

Thomas A. Buckley^*, K. R. Weihrauch^*, K. Latham, A. Z. Zhou, P. A. Manrai, and Arjun K. Manrai

We develop a simple algorithmic approach called GIANT that allows a multimodal LLM to navigate gigapixel pathology images. With GIANT, GPT-5 outperforms specialist pathology vision-language models.

arXiv preprint arXiv:2511.19652, Nov 2025

HTML
NEJM

Case 28-2025: A 36-Year-Old Man with Abdominal Pain, Fever, and Hypoxemia

Gurpreet Dhaliwal, C. Michael Hood, Arjun K. Manrai, Thomas A. Buckley, Akwi W. Asombang, and Elizabeth L. Hohmann

Our AI system, Dr. CaBot, generated the differential diagnosis for this challenging clinical case — the first AI-authored diagnosis published in an NEJM Clinicopathological Conference.

New England Journal of Medicine, Oct 2025

HTML
medRxiv

Scalable Screening for Emergency Department Missed Opportunities for Diagnosis Using Sequential eTriggers and Large Language Models

Clifford Marks, Sean Gibney, Bryan Stenson, Deesha Sarma, Cynthia Gaudet, Haadi Mombini, Thomas Buckley, Laura Burke, Nathan I. Shapiro, Jonathan L. Burstein, Shamai A. Grossman, Anika Parab, Alexander T. Janke, Arjun Manrai, Richard Andrew Taylor, Carlo L. Rosen, Adam Rodman, and Adrian D. Haimovich

medRxiv, Oct 2025

HTML
arXiv

Advancing Medical Artificial Intelligence Using a Century of Cases

Thomas A. Buckley, Riccardo Conci, Peter G. Brodeur, Jason Gusdorf, Sourik Beltrán, Bita Behrouzi, Byron Crowe, Jacob Dockterman, Muzzammil Muhammad, Sarah Ohnigian, Andrew Sanchez, James A. Diao, Aashna P. Shah, Daniel Restrepo, Eric S. Rosenberg, Andrew S. Lea, Marinka Zitnik, Scott H. Podolsky, Zahir Kanjee, Raja-Elie E. Abdulnour, Jacob M. Koshy, Adam Rodman, and Arjun K. Manrai

Dr. CaBot is an agentic AI system that emulates an expert diagnostician, generating written and slide-based presentations from the case description alone; in blinded evaluations, physicians could not distinguish CaBot’s differentials from those by human experts in 74% of trials. We also introduce CPC-Bench, a physician-validated benchmark of 7,102 NEJM Clinicopathological Conferences (1923–2025) and 47,648 questions across 10 reasoning tasks, on which CaBot outperforms frontier models. Both are publicly available at cpcbench.com.

arXiv preprint arXiv:2509.12194, Sep 2025

HTML
NEJM AI

Assessment of Large Language Models in Clinical Reasoning: A Novel Benchmarking Study

Liam G. McCoy, Rajiv Swamy, Nidhish Sagar, Minjia Wang, Stephen Bacchi, Jie Ming Nigel Fong, Nigel C. K. Tan, Kevin Tan, Thomas A. Buckley, Peter Brodeur, Leo Anthony Celi, Arjun K. Manrai, Aloysius Humbert, and Adam Rodman

NEJM AI, Sep 2025

HTML
JAMA HF

Comparison of Frontier Open-Source and Proprietary Large Language Models for Complex Diagnoses

Thomas A. Buckley, Byron Crowe, Raja-Elie E. Abdulnour, Adam Rodman, and Arjun K. Manrai

On NEJM clinicopathological cases, an open-source LLM (Llama 3.1 405B) matched or exceeded GPT-4 on diagnostic accuracy, suggesting that open-source models can compete with frontier proprietary systems on complex diagnostic reasoning.

JAMA Health Forum, Mar 2025

HTML

2024

arXiv

Multimodal Foundation Models Exploit Text to Make Medical Image Predictions

Thomas Buckley, James A. Diao, Pranav Rajpurkar, Adam Rodman, and Arjun K. Manrai

On benchmarks that pair an image with accompanying clinical text, multimodal foundation models score well primarily by leveraging the text rather than analyzing the image.

arXiv preprint arXiv:2311.05591, Nov 2024

HTML
JAMA

Projected Changes in Statin and Antihypertensive Therapy Eligibility with the AHA PREVENT Cardiovascular Risk Equations

James A. Diao, Ivy Shi, Venkatesh L. Murthy, Thomas A. Buckley, Chirag J. Patel, Emma Pierson, Robert W. Yeh, Dhruv S. Kazi, Rishi K. Wadhera, and Arjun K. Manrai

JAMA, Sep 2024

HTML

2023

JAMA NO

Artificial Intelligence vs Clinician Performance in Estimating Probabilities of Diagnoses Before and After Testing

Adam Rodman, Thomas A. Buckley, Arjun K. Manrai, and Daniel J. Morgan

Compared to a large baseline of physicians, LLMs accurately estimate pretest probabilities of diagnoses and update these estimates given new test results.

JAMA Network Open, Dec 2023

HTML