Data, Models, Tests — validation as structured scientific argumentation.

DMT-Eval decouples analyses from models through formal adapter interfaces, producing structured scientific reports (LabReports) from any (model, data) pair. The architectural insight was proven over seven years at the Blue Brain Project (EPFL, 2017–2024) and is now rebuilt for any domain where computational models need systematic evaluation.

Live Demo

bench.mayalucia.dev — run evaluations in real time. Weather prediction, drug efficacy, and Brain-Score NeuroAI benchmarks, all producing structured LabReports through the same pipeline.

Architecture

The core cycle: Scenario + Models + Data -> LabReport.

A Scenario descriptor tells the evaluator which columns are observed, which are predicted, and how to stratify. Models implement a minimal protocol (.name, .predict(observations)). The evaluator computes metrics (RMSE, bias, skill score), stratifies by group, and renders a Markdown report with abstract, methods, results tables, and discussion.

Three-Party Design

RoleResponsibility
Data Interface AuthorsDefine what a valid model looks like for a domain
Model AdaptersMake existing models conform to the interface
Validation WritersCompose evaluations from (adapter, interface) pairs

Seven-Level Interface Gradient

From dmt.evaluate() (zero concepts — pass models and data, get a report) to typing.Protocol-based interfaces with @dmt.adapter decorators and parameterized measurements.

Brain-Score Domain Adapter

The first domain adapter targets Brain-Score — the NeuroAI platform for comparing neural network representations to primate visual cortex recordings.

747 lines. 7 modules. Three architectural fixes over the original:

BBP Era (2017–2024)DMT-Eval (2026)
InterfaceMeta metaclass (MRO conflicts)__init_subclass__ hooks
No enforcement until runtime@implements validates at registration time
Shared mutable registry across all interfacesPer-interface PluginRegistry

Results (AlexNet on MajajHong2015 public subset):

BenchmarkRaw ScoreCeilingNormalized
IT-pls0.480.8170.588
V4-pls0.550.8920.616

Provenance

Co-author of the Blue Brain Project’s validation methodology, published in eLife (2024). The original framework evaluated cortical microcircuit models against 40+ experimental constraints across morphology, electrophysiology, and connectivity.

Source

github.com/mayalucia/dmt-eval