Thomas A. Buckley

thomas_buckley [at] hms [dot] harvard [dot] edu

I am a PhD student in the inaugural AI in Medicine Program, advised by Prof. Arjun Manrai. My PhD is supported by the inaugural Dunleavy Fellowship for Clinical AI.

Large language models can now produce accurate diagnoses, but their role in real clinical care remains incompletely understood. My goal is to develop AI systems that integrate multimodal data, clinical evidence, and longitudinal patient context to serve as diagnostic second opinions at the bedside.

My first-author work has appeared in Science, the New England Journal of Medicine (NEJM), and other venues. An AI system I developed generated the first AI-authored diagnosis published in the 100-year history of the NEJM Clinicopathological Conferences. My work on AI diagnosis has been covered by The New Yorker and named one of Harvard Medical School’s Top 10 Stories of 2025.

Before Harvard, I earned a B.S. in Computer Science and a B.S. in Electrical Engineering from the University of Massachusetts Amherst, where I conducted research in the BINDS Lab with Prof. Edward Rietman and Hava Siegelmann. I am a 2-time alumnus of the Dr. Susanne E. Churchill Summer Institute in Biomedical Informatics (SIBMI) at Harvard, where I now serve as sole Teaching Fellow for the annual cohort (Summer 2025 and 2026). I am an affiliate of ARISE, a Stanford–Harvard research collaborative on AI in medicine.

In my free time, I enjoy tennis 🎾 and cooking 👨‍🍳.

news

Apr 30, 2026	Our paper evaluating LLMs across six clinical reasoning tasks was published in Science. Coverage in The Guardian, NPR, Harvard Magazine, and Harvard Medical School.
Mar 25, 2026	Our reply about Dr. CaBot was published in the New England Journal of Medicine.
Dec 22, 2025	Dr. CaBot was named one of Harvard Medical School’s Top 10 Stories of 2025.
Oct 08, 2025	Harvard Medicine covered our AI-generated diagnosis published in the New England Journal of Medicine: AI System with Detailed Diagnostic Reasoning Makes Its Case.
Sep 29, 2025	Dr. CaBot featured in The New Yorker: If AI Can Diagnose Patients, What Are Doctors For?
Sep 15, 2025	Launched CPCBench, the public website for CPC-Bench, a large-scale benchmark based on the NEJM CPCs, and Dr. CaBot.
Apr 04, 2025	Interviewed on the JAMA+ AI Podcast about our paper showing that open-source LLMs can now compete with closed-source models in diagnostic reasoning.

selected publications

★ denotes a first-authored paper. * denotes co-first authorship.

Science

Performance of a Large Language Model on the Reasoning Tasks of a Physician

Peter G. Brodeur^*, Thomas A. Buckley^*, Zahir Kanjee, Ethan Goh, Evelyn Bin Ling, Priyank Jain, Stephanie Cabral, Raja-Elie Abdulnour, Adrian D. Haimovich, Jason A. Freed, Andrew Olson, Daniel J. Morgan, Jason Hom, Robert Gallo, Liam G. McCoy, Haadi Mombini, Christopher Lucas, Misha Fotoohi, Matthew Gwiazdon, Daniele Restifo, Daniel Restrepo, Eric Horvitz, Jonathan Chen, Arjun K. Manrai, and Adam Rodman

Across six clinical reasoning tasks — including an experiment using real cases from the Beth Israel Deaconess Medical Center emergency department — a reasoning model (OpenAI o1) matched or exceeded a large panel of attending physicians. The result suggests that LLMs are saturating current clinical-reasoning benchmarks, motivating the need for prospective trials.

Science, Apr 2026

HTML
arXiv

Navigating Gigapixel Pathology Images with Large Multimodal Models

Thomas A. Buckley^*, K. R. Weihrauch^*, K. Latham, A. Z. Zhou, P. A. Manrai, and Arjun K. Manrai

We develop a simple algorithmic approach called GIANT that allows a multimodal LLM to navigate gigapixel pathology images. With GIANT, GPT-5 outperforms specialist pathology vision-language models.

arXiv preprint arXiv:2511.19652, Nov 2025

HTML
NEJM

Case 28-2025: A 36-Year-Old Man with Abdominal Pain, Fever, and Hypoxemia

Gurpreet Dhaliwal, C. Michael Hood, Arjun K. Manrai, Thomas A. Buckley, Akwi W. Asombang, and Elizabeth L. Hohmann

Our AI system, Dr. CaBot, generated the differential diagnosis for this challenging clinical case — the first AI-authored diagnosis published in an NEJM Clinicopathological Conference.

New England Journal of Medicine, Oct 2025

HTML
arXiv

Advancing Medical Artificial Intelligence Using a Century of Cases

Thomas A. Buckley, Riccardo Conci, Peter G. Brodeur, Jason Gusdorf, Sourik Beltrán, Bita Behrouzi, Byron Crowe, Jacob Dockterman, Muzzammil Muhammad, Sarah Ohnigian, Andrew Sanchez, James A. Diao, Aashna P. Shah, Daniel Restrepo, Eric S. Rosenberg, Andrew S. Lea, Marinka Zitnik, Scott H. Podolsky, Zahir Kanjee, Raja-Elie E. Abdulnour, Jacob M. Koshy, Adam Rodman, and Arjun K. Manrai

Dr. CaBot is an agentic AI system that emulates an expert diagnostician, generating written and slide-based presentations from the case description alone; in blinded evaluations, physicians could not distinguish CaBot’s differentials from those by human experts in 74% of trials. We also introduce CPC-Bench, a physician-validated benchmark of 7,102 NEJM Clinicopathological Conferences (1923–2025) and 47,648 questions across 10 reasoning tasks, on which CaBot outperforms frontier models. Both are publicly available at cpcbench.com.

arXiv preprint arXiv:2509.12194, Sep 2025

HTML
JAMA HF

Comparison of Frontier Open-Source and Proprietary Large Language Models for Complex Diagnoses

Thomas A. Buckley, Byron Crowe, Raja-Elie E. Abdulnour, Adam Rodman, and Arjun K. Manrai

On NEJM clinicopathological cases, an open-source LLM (Llama 3.1 405B) matched or exceeded GPT-4 on diagnostic accuracy, suggesting that open-source models can compete with frontier proprietary systems on complex diagnostic reasoning.

JAMA Health Forum, Mar 2025

HTML
arXiv

Multimodal Foundation Models Exploit Text to Make Medical Image Predictions

Thomas Buckley, James A. Diao, Pranav Rajpurkar, Adam Rodman, and Arjun K. Manrai

On benchmarks that pair an image with accompanying clinical text, multimodal foundation models score well primarily by leveraging the text rather than analyzing the image.

arXiv preprint arXiv:2311.05591, Nov 2024

HTML