Local Docker Stack — Federal DS Handbook

Services

The stack ships eight services. Each runs in its own container with a dedicated Dockerfile. Together they approximate the compute, storage, tracking, monitoring, and auth surfaces you encounter on Advana, Databricks, and Navy Jupiter.

Jupyter

JupyterLab notebooks — the primary interface for running chapter code examples and exercises

Port 8888

MLflow

Experiment tracking and model registry, backed by PostgreSQL — mirrors the MLflow server on Databricks

Port 5000

PostgreSQL

Relational database for data exercises and as the MLflow backend store

Port 5432

Redis

In-memory cache and message broker for pipeline and streaming exercises

Port 6379

Nginx

Reverse proxy with SSL termination — routes traffic to Jupyter and MLflow, simulates gateway behavior

Port 80 / 443

Prometheus

Metrics collection for all running services — used in Chapter 9 (MLOps) monitoring examples

Port 9090

Grafana

Monitoring dashboards fed by Prometheus — pre-provisioned with handbook dashboard configs

Port 3000

CAC Auth

Smart card authentication simulator — lets you test CAC/PIV auth flows without hardware readers

Port 8443

Quick Start

Three commands get the full stack running. If you only need Jupyter and MLflow for a specific chapter, see Selective Startup below.

bash

# Clone the repository
git clone https://github.com/aporb/data-science-learning-handbook.git
cd data-science-learning-handbook

# Copy environment template and configure
cp .env.example .env

# Start all services
docker compose up -d

# Verify
docker compose ps

JupyterLab will be available at http://localhost:8888. MLflow at http://localhost:5000. The first startup pulls images and builds containers — allow 3–5 minutes.

Configuration

Copy .env.example to .env before starting the stack. The file is pre-populated with safe defaults for local development; you only need to change values if you want to connect to live federal platform APIs.

The four variable groups in .env:

Database credentials — PostgreSQL user, password, and database name. MLflow uses these to connect to its backend store.
MLflow settings — Artifact storage path and tracking URI. Defaults to local filesystem under mlartifacts/.
Platform API keys — Optional Databricks tokens and Qlik API keys. Required only for platform-specific exercises that hit live APIs.
CAC/PIV certificate paths — Paths to your test certificates for the auth service simulator. Leave blank to use the bundled test certs.

See env-example.txt in the repo root for a fully commented reference of every available variable.

Architecture

Nginx sits at the edge and routes inbound requests. Jupyter and MLflow are the primary workload services. Prometheus scrapes metrics from all services; Grafana reads from Prometheus. CAC Auth connects through Nginx for auth-gated endpoints.

graph LR A[Jupyter :8888] --> B[MLflow :5000] A --> C[PostgreSQL :5432] A --> D[Redis :6379] B --> C E[Nginx :80/443] --> A E --> B F[Prometheus :9090] --> A F --> B F --> C G[Grafana :3000] --> F H[CAC Auth :8443] --> E

Service topology. Nginx is the single ingress point. Prometheus scrapes all services; Grafana visualizes the metrics. CAC Auth chains through Nginx for auth-gated request flows.

Selective Startup

The full stack uses 6–8 GB of RAM. If you are working through a specific chapter and don't need monitoring or auth services, start only what that chapter requires.

bash

# Just the essentials for chapter exercises
docker compose up -d jupyter mlflow postgres

# Add monitoring for Chapter 9 (MLOps)
docker compose up -d prometheus grafana

# Add auth services for Chapter 12 (Ethics & Governance)
docker compose up -d cac-auth nginx

MLflow depends on PostgreSQL. If you start MLflow, Docker Compose will automatically start postgres too — you don't need to name it explicitly.

Get the Full Stack

All Dockerfiles, Nginx configs, Prometheus rules, Grafana dashboard provisioning, and the CAC Auth service source are in the docker/ directory of the repository.

View Docker Configuration on GitHub →