SciTrails

Scientific Documentation & Knowledge Representation · 2020–2024

Python YAML Reproducibility Data Science

Overview

SciTrails (originally "circuit-factology") is a framework for converting computational results into structured, publishable scientific knowledge. It addresses the reproducibility crisis by treating scientific documentation as a first-class computational artifact — not an afterthought written after the analysis is done.

The Problem

Scientific analyses, particularly in computational neuroscience, involve multiple interconnected steps: data processing, statistical analysis, visualization, and interpretation. These are typically managed through ad-hoc scripts, Jupyter notebooks, and manual documentation. The result is fragmented, non-reproducible work where the path from raw data to a published figure is often obscure. A small change in the model requires re-running a tangled web of scripts, usually with manual intervention.

The Solution: Declarative Fact Generation

Instead of writing imperative scripts ("do this, then that, then save a plot"), the scientist declares what knowledge they want to obtain. The framework handles the how.

This is achieved through a clear hierarchy:

How It Works

What the scientist provides:

  1. A data interface — a Laboratory-like class for their specific dataset
  2. Measurement functions — Python functions performing core calculations
  3. YAML configurations — declaring desired Facts, Figures, and Factsheets

What the framework delivers:

  1. Automated reports — structured, human-readable documents from a single command (scitale init → setup → generate)
  2. Full reproducibility — the entire knowledge generation process captured in configuration and code
  3. Clear provenance — every number and figure linked to the code that generated it
  4. Scalability — add new analyses or run the suite on a new dataset with minimal changes

Design Principles

Comparative Positioning

SciTrails occupies a unique niche compared to existing tools: it combines the interactivity of Jupyter with the reproducibility of Snakemake and the publication quality of RMarkdown, while adding declarative fact generation and provenance tracking that none of these provide individually.

Technical Stack

Python · YAML configuration · Jinja2 templating · HTML / Markdown / PDF generation · Git-based version control · HDF5 data storage

← Projects