publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

denotes a first-authored paper. * denotes co-first authorship.

2026

  1. superhuman.png
    Science
    Peter G. Brodeur*, Thomas A. Buckley*, Zahir Kanjee, Ethan Goh, Evelyn Bin Ling, Priyank Jain, Stephanie Cabral, Raja-Elie Abdulnour, Adrian D. Haimovich, Jason A. Freed, Andrew Olson, Daniel J. Morgan, Jason Hom, Robert Gallo, Liam G. McCoy, Haadi Mombini, Christopher Lucas, Misha Fotoohi, Matthew Gwiazdon, Daniele Restifo, Daniel Restrepo, Eric Horvitz, Jonathan Chen, Arjun K. Manrai, and Adam Rodman

    Across six clinical reasoning tasks — including an experiment using real cases from the Beth Israel Deaconess Medical Center emergency department — a reasoning model (OpenAI o1) matched or exceeded a large panel of attending physicians. The result suggests that LLMs are saturating current clinical-reasoning benchmarks, motivating the need for prospective trials.

    Science, Apr 2026
  2. case_28_reply.png
    NEJM
    Thomas A. Buckley, Gurpreet Dhaliwal, and Arjun K. Manrai
    New England Journal of Medicine, Mar 2026
  3. values.png
    NEJM AI
    Carey Goldberg, Ran D. Balicer, Mamatha Bhat, David Blumenthal, Rebecca W. Brendel, Elena Brondolo, John S. Brownstein, Thomas A. Buckley, Carol H. Cain, Payal Chandak, Frank Chessa, Aneesh Chopra, Noa Dagan, Kenneth S. Ehlert, Barbara J. Evans, Robert Freeman, Benjamin S. Glicksberg, William Gordon, Matthias I. Gröschel, Sara Hoffman, Edward M. Hundert, Sonny Hyare, Shreya Johri, Joshua Joseph, Mary Klote, Adam B. Landman, Vivian S. Lee, Joshua C. Mandel, Kenneth D. Mandl, Arjun K. Manrai, Matthew Might, Girish N. Nadkarni, Daniel J. Nigrin, Ayush Noori, Gilbert S. Omenn, Enea Parimbelli, Andrew L. Rosenberg, David Stutz, Melanie Tory, Eugene Tunik, Susan M. Wolf, Marinka Zitnik, and Isaac Kohane
    NEJM AI, Jan 2026

2025

  1. giant.png
    arXiv
    Thomas A. Buckley*, K. R. Weihrauch*, K. Latham, A. Z. Zhou, P. A. Manrai, and Arjun K. Manrai

    We develop a simple algorithmic approach called GIANT that allows a multimodal LLM to navigate gigapixel pathology images. With GIANT, GPT-5 outperforms specialist pathology vision-language models.

    arXiv preprint arXiv:2511.19652, Nov 2025
  2. case_28.png
    NEJM
    Gurpreet Dhaliwal, C. Michael Hood, Arjun K. Manrai, Thomas A. Buckley, Akwi W. Asombang, and Elizabeth L. Hohmann

    Our AI system, Dr. CaBot, generated the differential diagnosis for this challenging clinical case — the first AI-authored diagnosis published in an NEJM Clinicopathological Conference.

    New England Journal of Medicine, Oct 2025
  3. scalable_screening.png
    medRxiv
    Clifford Marks, Sean Gibney, Bryan Stenson, Deesha Sarma, Cynthia Gaudet, Haadi Mombini, Thomas Buckley, Laura Burke, Nathan I. Shapiro, Jonathan L. Burstein, Shamai A. Grossman, Anika Parab, Alexander T. Janke, Arjun Manrai, Richard Andrew Taylor, Carlo L. Rosen, Adam Rodman, and Adrian D. Haimovich
    medRxiv, Oct 2025
  4. cabot.png
    arXiv
    Thomas A. Buckley, Riccardo Conci, Peter G. Brodeur, Jason Gusdorf, Sourik Beltrán, Bita Behrouzi, Byron Crowe, Jacob Dockterman, Muzzammil Muhammad, Sarah Ohnigian, Andrew Sanchez, James A. Diao, Aashna P. Shah, Daniel Restrepo, Eric S. Rosenberg, Andrew S. Lea, Marinka Zitnik, Scott H. Podolsky, Zahir Kanjee, Raja-Elie E. Abdulnour, Jacob M. Koshy, Adam Rodman, and Arjun K. Manrai

    Dr. CaBot is an agentic AI system that emulates an expert diagnostician, generating written and slide-based presentations from the case description alone; in blinded evaluations, physicians could not distinguish CaBot’s differentials from those by human experts in 74% of trials. We also introduce CPC-Bench, a physician-validated benchmark of 7,102 NEJM Clinicopathological Conferences (1923–2025) and 47,648 questions across 10 reasoning tasks, on which CaBot outperforms frontier models. Both are publicly available at cpcbench.com.

    arXiv preprint arXiv:2509.12194, Sep 2025
  5. sctbench.png
    NEJM AI
    Liam G. McCoy, Rajiv Swamy, Nidhish Sagar, Minjia Wang, Stephen Bacchi, Jie Ming Nigel Fong, Nigel C. K. Tan, Kevin Tan, Thomas A. Buckley, Peter Brodeur, Leo Anthony Celi, Arjun K. Manrai, Aloysius Humbert, and Adam Rodman
    NEJM AI, Sep 2025
  6. open_source.png
    JAMA HF
    Thomas A. Buckley, Byron Crowe, Raja-Elie E. Abdulnour, Adam Rodman, and Arjun K. Manrai

    On NEJM clinicopathological cases, an open-source LLM (Llama 3.1 405B) matched or exceeded GPT-4 on diagnostic accuracy, suggesting that open-source models can compete with frontier proprietary systems on complex diagnostic reasoning.

    JAMA Health Forum, Mar 2025

2024

  1. exploit.png
    arXiv
    Thomas Buckley, James A. Diao, Pranav Rajpurkar, Adam Rodman, and Arjun K. Manrai

    On benchmarks that pair an image with accompanying clinical text, multimodal foundation models score well primarily by leveraging the text rather than analyzing the image.

    arXiv preprint arXiv:2311.05591, Nov 2024
  2. statin.png
    JAMA
    James A. Diao, Ivy Shi, Venkatesh L. Murthy, Thomas A. Buckley, Chirag J. Patel, Emma Pierson, Robert W. Yeh, Dhruv S. Kazi, Rishi K. Wadhera, and Arjun K. Manrai
    JAMA, Sep 2024

2023

  1. probability.png
    JAMA NO
    Adam Rodman, Thomas A. Buckley, Arjun K. Manrai, and Daniel J. Morgan

    Compared to a large baseline of physicians, LLMs accurately estimate pretest probabilities of diagnoses and update these estimates given new test results.

    JAMA Network Open, Dec 2023