
State of AI Reasoning for Theoretical Physics - Insights from the TPBench Project
Varian 355
Zoom Meeting ID: 922 4971 4551; Password: 582332
Abstract: The newest large-language reasoning models are for the first time powerful enough to perform mathematical reasoning in theoretical physics at graduate level. In the mathematics community, data sets such as FrontierMath are being used to drive progress and evaluate models, but theoretical physics has so far received less attention. In this talk I will present our dataset TPBench (arxiv:2502.15815, tpbench.org), which was constructed to benchmark and improve AI models specifically for theoretical physics. We find extremely rapid progress of models over the last months, but also significant challenges at research level difficulty. I will also discuss strategies to improve these models for theoretical physics and show some early results using test-time scaling techniques on our problems.