Development Environment

Local Docker Stack

A Docker Compose environment that mirrors federal platform constraints locally. Jupyter, MLflow, PostgreSQL, and CAC/PIV authentication services — the tools you need to run handbook code examples in conditions that approximate real DoD platforms.

Docker Compose IL4/IL5 Simulation 8 Services
Services
8
Dockerfiles
8
Requires
Docker 20.10+
RAM
8 GB minimum

Services

The stack ships eight services. Each runs in its own container with a dedicated Dockerfile. Together they approximate the compute, storage, tracking, monitoring, and auth surfaces you encounter on Advana, Databricks, and Navy Jupiter.

Jupyter
JupyterLab notebooks — the primary interface for running chapter code examples and exercises
Port 8888
MLflow
Experiment tracking and model registry, backed by PostgreSQL — mirrors the MLflow server on Databricks
Port 5000
PostgreSQL
Relational database for data exercises and as the MLflow backend store
Port 5432
Redis
In-memory cache and message broker for pipeline and streaming exercises
Port 6379
Nginx
Reverse proxy with SSL termination — routes traffic to Jupyter and MLflow, simulates gateway behavior
Port 80 / 443
Prometheus
Metrics collection for all running services — used in Chapter 9 (MLOps) monitoring examples
Port 9090
Grafana
Monitoring dashboards fed by Prometheus — pre-provisioned with handbook dashboard configs
Port 3000
CAC Auth
Smart card authentication simulator — lets you test CAC/PIV auth flows without hardware readers
Port 8443

Quick Start

Three commands get the full stack running. If you only need Jupyter and MLflow for a specific chapter, see Selective Startup below.

bash
# Clone the repository
git clone https://github.com/aporb/data-science-learning-handbook.git
cd data-science-learning-handbook

# Copy environment template and configure
cp .env.example .env

# Start all services
docker compose up -d

# Verify
docker compose ps

JupyterLab will be available at http://localhost:8888. MLflow at http://localhost:5000. The first startup pulls images and builds containers — allow 3–5 minutes.

Configuration

Copy .env.example to .env before starting the stack. The file is pre-populated with safe defaults for local development; you only need to change values if you want to connect to live federal platform APIs.

The four variable groups in .env:

  • Database credentials — PostgreSQL user, password, and database name. MLflow uses these to connect to its backend store.
  • MLflow settings — Artifact storage path and tracking URI. Defaults to local filesystem under mlartifacts/.
  • Platform API keys — Optional Databricks tokens and Qlik API keys. Required only for platform-specific exercises that hit live APIs.
  • CAC/PIV certificate paths — Paths to your test certificates for the auth service simulator. Leave blank to use the bundled test certs.

See env-example.txt in the repo root for a fully commented reference of every available variable.

Architecture

Nginx sits at the edge and routes inbound requests. Jupyter and MLflow are the primary workload services. Prometheus scrapes metrics from all services; Grafana reads from Prometheus. CAC Auth connects through Nginx for auth-gated endpoints.

graph LR A[Jupyter :8888] --> B[MLflow :5000] A --> C[PostgreSQL :5432] A --> D[Redis :6379] B --> C E[Nginx :80/443] --> A E --> B F[Prometheus :9090] --> A F --> B F --> C G[Grafana :3000] --> F H[CAC Auth :8443] --> E

Service topology. Nginx is the single ingress point. Prometheus scrapes all services; Grafana visualizes the metrics. CAC Auth chains through Nginx for auth-gated request flows.

Selective Startup

The full stack uses 6–8 GB of RAM. If you are working through a specific chapter and don't need monitoring or auth services, start only what that chapter requires.

bash
# Just the essentials for chapter exercises
docker compose up -d jupyter mlflow postgres

# Add monitoring for Chapter 9 (MLOps)
docker compose up -d prometheus grafana

# Add auth services for Chapter 12 (Ethics & Governance)
docker compose up -d cac-auth nginx

MLflow depends on PostgreSQL. If you start MLflow, Docker Compose will automatically start postgres too — you don't need to name it explicitly.

Get the Full Stack

All Dockerfiles, Nginx configs, Prometheus rules, Grafana dashboard provisioning, and the CAC Auth service source are in the docker/ directory of the repository.

View Docker Configuration on GitHub →