ICML 2026 · Scientific Machine Learning

Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation

Sum Kyun Song^1,†, Bong Gyun Shin^2,†, Jae Yong Lee^1,*

¹Chung-Ang University · ²Daejin University · ^†Equal contribution · ^*Corresponding author

Paper PDF Code ↗ BibTeX

Abstract

Discovering governing differential equations from observational data is a central challenge in scientific machine learning. Existing symbolic regression approaches often emphasize numerical metrics, but scientific modeling also requires domain knowledge and physical plausibility.

Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation (DoLQ) addresses this gap with a multi-agent framework. A Sampler Agent proposes candidate dynamics, a Parameter Optimizer refines their coefficients, and a Scientist Agent evaluates each term through both qualitative semantic reasoning and quantitative contribution analysis. Across multi-dimensional ODE benchmarks, DoLQ achieves higher success rates and more accurately recovers the symbolic terms of the ground-truth equations.

Method

Overview of the DoLQ framework connecting Sampler Agent, Parameter Optimizer, and Scientist Agent. — **DoLQ overview.** The loop has three roles: the Sampler proposes ODE terms, the Optimizer fits coefficients, and the Scientist decides which terms to keep, hold, or remove. These decisions become feedback for the next search iteration.

The framework forms a closed loop: candidate ODE terms are generated from the system description, optimized against numerical evidence, evaluated for both semantic plausibility and quantitative impact, and then revised using the Scientist Agent's feedback. This makes the search easier to read than a purely numerical symbolic regression loop, because every retained or removed term is tied to an explicit evaluation signal.

The Scientist Agent combines two complementary signals: a quantitative ablation test that measures how each term affects residual error, and a qualitative semantic check that judges whether the term is physically meaningful under the system description. The combined signal determines whether a term is kept, held for more evidence, or removed from future proposals.

Results

We evaluate DoLQ against representative LLM-based symbolic regression baselines, including ICSR, LASR, LLM-SR, and EDL. Quantitative performance is measured with residual and integral NMSE under ID and ID-Ext regimes, while structural quality is assessed by whether the recovered equations match the ground-truth symbolic terms.

Quantitative NMSE comparison across SIR, CDIMA, and Glider benchmarks. — **Quantitative NMSE comparison.** The table reports dimension-averaged residual and integral NMSE on representative ODE systems under ID and ID-Ext evaluation regimes. Lower values indicate better recovery; bold-underlined values mark the best result and bold values mark the second best.

Equation comparison between DoLQ and baseline symbolic regression methods on the 2D dimensionless Glider system. — **Equation comparison.** On the 2D dimensionless Glider system, this comparison shows the recovered symbolic equations rather than only their numerical errors. DoLQ retains the ground-truth terms with fewer unnecessary terms, while several baselines either miss key dynamics or produce bloated equation skeletons.

Success score comparison across ODE benchmarks. — **Benchmark success.** Across eight ODE systems, DoLQ achieves the strongest success scores under both the NMSE criterion and the term-recovery criterion. This indicates that DoLQ improves not only trajectory-level accuracy but also recovery of the underlying symbolic structure.

Citation

BibTeX

@article{song2026discovering,
  title={Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation},
  author={Song, Sum Kyun and Shin, Bong Gyun and Lee, Jae Yong},
  journal={arXiv preprint arXiv:2605.07323},
  year={2026}
}