Open Reference — No Registration Required

Federal Data Science Handbook

A practical, platform-specific guide to data science in the federal government. Built for practitioners who need to ship real work on real DoD platforms — not academic exercises in ideal conditions.

Start Reading Browse Chapters View on GitHub

Covers ● Advana ● Databricks ● Qlik ● Navy Jupiter ● Palantir AIP/Foundry

Curriculum

13 Chapters

From your first week on contract to deploying production ML models. Each chapter is grounded in the specific platforms, constraints, and compliance requirements of federal data science work.

Reference

5 Platform Guides

Deep-dive reference for each of the five major DoD data science platforms. Architecture, access patterns, key APIs, and production-ready code examples.

Advana

The Pentagon's enterprise analytics platform. DoD-wide data, 100,000+ users, financial and logistics data, and the CDAO's AI development foundation.

IL2–IL5 NIPR/SIPR CDAO

Databricks

FedRAMP High lakehouse for heavy ML and data engineering. PySpark, MLflow, Unity Catalog, Delta Lake. The production ML environment on Advana and Jupiter.

FedRAMP High IL5 AWS GovCloud

Qlik

The BI and analytics layer for dashboards and data exploration. QIX associative engine, Server-Side Extensions for ML integration, JWCC on AWS.

FedRAMP Moderate IL4 JWCC

Navy Jupiter

The DON's enterprise data environment. Advana subtenant with tri-network accreditation (NIPR, SIPR, JWICS), bronze/silver/gold data tiers, and PII/PHI authorization.

NIPR/SIPR/JWICS PII/PHI DON

Palantir AIP/Foundry

Ontology-backed operational AI platform. The Maven Smart System, AIP Agent Studio, FedRAMP High across the full suite, Army Enterprise Agreement coverage.

FedRAMP High IL5/IL6 Army EA

Beyond the Chapters

The Full Toolkit

The handbook repository includes everything you need to learn, practice, and build in a federal data science environment.

💻

67 Exercises

Hands-on problems with full solutions for every chapter — coding challenges, analysis tasks, and scenario-based problems grounded in federal data.

13 exercise sets

📦

Docker Environment

A local development stack that mirrors federal platform constraints — Jupyter, MLflow, PostgreSQL, CAC/PIV auth, and monitoring services.

8 services

🔒

Security Reference

A 309-file DoD security implementation — CAC/PIV authentication, RBAC, multi-classification data handling, and automated compliance documentation.

249K lines

🐍

41 Python Files

Working code examples for every chapter — written to run on the platforms they describe, not on a local machine with unconstrained internet.

5 platforms covered

Agent-Ready

Clone this repo. Your AI agent becomes a federal data science expert.

The handbook ships with native context files for Claude Code, Cursor, OpenCode, and Cline. Clone it, open it in your agent, and 96,000 words of platform constraints, compliance rules, and code patterns are live context for every conversation.

How It Works Clone the Repo

/compliance-check

Reviews code against NIST 800-53, DoD AI Ethics, and FedRAMP requirements. Returns severity-classified findings with remediation pointing to specific handbook sections.

Claude Code OpenCode Cline Cursor

/generate-federal-code

Generates platform-aware Python with mandatory compliance headers. Covers 22 task types across Databricks, Foundry, Advana, and Navy Jupiter — no hardcoded credentials, no external API calls at IL4+.

Claude Code OpenCode Cline Cursor

/teach

Interactive tutor mode. Opens with the chapter's narrative hook, presents learning objectives as a live checklist, and walks through concepts with code. Practitioner voice, not academic.

Claude Code OpenCode Cline Cursor

About

Who this is for

This handbook is written for data scientists and analysts starting a federal contract — or struggling through one. It assumes you know how to write Python and build models. It does not assume you know what a DD Form 2875 is, why PySpark is non-negotiable at DoD data scale, or how to explain your model's outputs to a flag officer.

The federal data science environment is not better or worse than commercial work. It is different in specific ways that are almost never documented. Access is denied by default. Classification levels determine architecture. Platforms are managed, not owned. The data quality problems are real and they require coordination, not just code.

This handbook documents the specific ways it is different, and what to do about each of them.