SciTrails

Scientific Documentation & Knowledge Representation · 2020–2024

Python YAML Reproducibility Data Science

Overview

SciTrails (originally "circuit-factology") is a framework for converting computational results into structured, publishable scientific knowledge. It addresses the reproducibility crisis by treating scientific documentation as a first-class computational artifact — not an afterthought written after the analysis is done.

The Problem

Scientific analyses, particularly in computational neuroscience, involve multiple interconnected steps: data processing, statistical analysis, visualization, and interpretation. These are typically managed through ad-hoc scripts, Jupyter notebooks, and manual documentation. The result is fragmented, non-reproducible work where the path from raw data to a published figure is often obscure. A small change in the model requires re-running a tangled web of scripts, usually with manual intervention.

The Solution: Declarative Fact Generation

Instead of writing imperative scripts ("do this, then that, then save a plot"), the scientist declares what knowledge they want to obtain. The framework handles the how.

This is achieved through a clear hierarchy:

Laboratory — A consistent interface for querying the underlying dataset (the circuit model)
Measurement — Isolated algorithmic logic for calculating a single value or generating a single plot
Fact / Figure — A structured object binding a scientific question to its computed answer, with full provenance
Factsheet — A thematic collection of Facts and Figures constituting a coherent report on a topic

How It Works

What the scientist provides:

A data interface — a Laboratory-like class for their specific dataset
Measurement functions — Python functions performing core calculations
YAML configurations — declaring desired Facts, Figures, and Factsheets

What the framework delivers:

Automated reports — structured, human-readable documents from a single command (scitale init → setup → generate)
Full reproducibility — the entire knowledge generation process captured in configuration and code
Clear provenance — every number and figure linked to the code that generated it
Scalability — add new analyses or run the suite on a new dataset with minimal changes

Design Principles

Separation of content from presentation — scientific logic is independent of output format (HTML, PDF, Markdown)
Version-controlled notebooks — unlike Jupyter, configurations are diff-friendly YAML that work with git
FAIR principles — Findable, Accessible, Interoperable, Reusable knowledge artifacts
Multi-scale organization — handles hierarchical data from brain regions down to individual synapses

Comparative Positioning

SciTrails occupies a unique niche compared to existing tools: it combines the interactivity of Jupyter with the reproducibility of Snakemake and the publication quality of RMarkdown, while adding declarative fact generation and provenance tracking that none of these provide individually.

Technical Stack

Python · YAML configuration · Jinja2 templating · HTML / Markdown / PDF generation · Git-based version control · HDF5 data storage

← Projects