Services
The stack ships eight services. Each runs in its own container with a dedicated Dockerfile. Together they approximate the compute, storage, tracking, monitoring, and auth surfaces you encounter on Advana, Databricks, and Navy Jupiter.
Quick Start
Three commands get the full stack running. If you only need Jupyter and MLflow for a specific chapter, see Selective Startup below.
# Clone the repository
git clone https://github.com/aporb/data-science-learning-handbook.git
cd data-science-learning-handbook
# Copy environment template and configure
cp .env.example .env
# Start all services
docker compose up -d
# Verify
docker compose ps
JupyterLab will be available at http://localhost:8888. MLflow at http://localhost:5000. The first startup pulls images and builds containers — allow 3–5 minutes.
Configuration
Copy .env.example to .env before starting the stack. The file is pre-populated with safe defaults for local development; you only need to change values if you want to connect to live federal platform APIs.
The four variable groups in .env:
- Database credentials — PostgreSQL user, password, and database name. MLflow uses these to connect to its backend store.
- MLflow settings — Artifact storage path and tracking URI. Defaults to local filesystem under
mlartifacts/. - Platform API keys — Optional Databricks tokens and Qlik API keys. Required only for platform-specific exercises that hit live APIs.
- CAC/PIV certificate paths — Paths to your test certificates for the auth service simulator. Leave blank to use the bundled test certs.
See env-example.txt in the repo root for a fully commented reference of every available variable.
Architecture
Nginx sits at the edge and routes inbound requests. Jupyter and MLflow are the primary workload services. Prometheus scrapes metrics from all services; Grafana reads from Prometheus. CAC Auth connects through Nginx for auth-gated endpoints.
Service topology. Nginx is the single ingress point. Prometheus scrapes all services; Grafana visualizes the metrics. CAC Auth chains through Nginx for auth-gated request flows.
Selective Startup
The full stack uses 6–8 GB of RAM. If you are working through a specific chapter and don't need monitoring or auth services, start only what that chapter requires.
# Just the essentials for chapter exercises
docker compose up -d jupyter mlflow postgres
# Add monitoring for Chapter 9 (MLOps)
docker compose up -d prometheus grafana
# Add auth services for Chapter 12 (Ethics & Governance)
docker compose up -d cac-auth nginx
MLflow depends on PostgreSQL. If you start MLflow, Docker Compose will automatically start postgres too — you don't need to name it explicitly.
Get the Full Stack
All Dockerfiles, Nginx configs, Prometheus rules, Grafana dashboard provisioning, and the CAC Auth service source are in the docker/ directory of the repository.