DERIVA

The Discovery Environment for Relational Information and Versioned Assets (DERIVA) is a data-centric platform that treats every file, table row, and analysis result as a first-class, version-controlled asset from the moment it is generated in the lab to its final citation in a paper. By combining a model-driven relational catalog with a versioned object store and open APIs, DERIVA lets scientists curate evolving datasets, automate quality checks, and publish continuously FAIR collections.

Continuous FAIR Data Lifecycle

Every asset — raw, intermediate, or published — is instantly Findable, Accessible, Interoperable, and Reusable, with schema evolution tracked from experiment design to publication.

Reproducible ML and Informatics Workflows

A Python library plus GPU-enabled JupyterHub pulls curated datasets, tracks executions, captures configs & results, and pushes them back with provenance — ideal for reproducible AI pipelines

Sophisticated search

Faceted, full-text, and model-aware search lets scientists pinpoint records across billions of rows in milliseconds.

Data visualizations

Built-in dashboards render interactive plots, heat maps, and dimension-reduced embeddings directly from live project data—no exporting required.

Self-Service Curation at Scale

Scientists can load and publish their own data while hub curators review; FaceBase used this model to surpass 1,000 datasets and 30 projects in two years.

Flexible Ingest & Metadata QC

Command-line and Python tools bulk-load any file types, attach controlled-vocabulary metadata, and trigger automated QC dashboards for “self-curation” at scale.

Versioned, Provenance-Aware Storage

Hatrac object store + BagIt/BDBag packages + Minid persistent IDs guarantee fixity, trace every revision, and make dataset exchange reproducible and cache-friendly.

Fine-Grained Federated Access Control

[Globus Auth & Groups] let projects enforce reader/writer/curator roles, embargoes, and single-sign-on with ORCID, Google, or campus IDs—critical for cross-institution work.

Model-Driven Interface (Chaise)

DERIVA introspects an ER model and auto-generates rich search, edit, and visualization pages. Define or tweak your entity-relationship model and DERIVA auto-generates a rich web UI.

Loosely-Coupled, Brandable Architecture

Micro-services with public APIs, style/theming hooks, and adaptive UI let each consortium stand up a branded portal yet share the same battle-tested core.

Proven Multi-Domain Track Record

Deployed in neuroscience, craniofacial biology, ophthalmology ML, and more—demonstrating adaptability and accelerated discovery across disciplines.

View Projects Built on DERIVA

Accelerating discovery through every stage of the data lifecycle