Open Reference — No Registration Required
Federal Data Science Handbook
A practical, platform-specific guide to data science in the federal government. Built for practitioners who need to ship real work on real DoD platforms — not academic exercises in ideal conditions.
13 Chapters
From your first week on contract to deploying production ML models. Each chapter is grounded in the specific platforms, constraints, and compliance requirements of federal data science work.
5 Platform Guides
Deep-dive reference for each of the five major DoD data science platforms. Architecture, access patterns, key APIs, and production-ready code examples.
The Pentagon's enterprise analytics platform. DoD-wide data, 100,000+ users, financial and logistics data, and the CDAO's AI development foundation.
FedRAMP High lakehouse for heavy ML and data engineering. PySpark, MLflow, Unity Catalog, Delta Lake. The production ML environment on Advana and Jupiter.
The BI and analytics layer for dashboards and data exploration. QIX associative engine, Server-Side Extensions for ML integration, JWCC on AWS.
The DON's enterprise data environment. Advana subtenant with tri-network accreditation (NIPR, SIPR, JWICS), bronze/silver/gold data tiers, and PII/PHI authorization.
Ontology-backed operational AI platform. The Maven Smart System, AIP Agent Studio, FedRAMP High across the full suite, Army Enterprise Agreement coverage.
The Full Toolkit
The handbook repository includes everything you need to learn, practice, and build in a federal data science environment.
Clone this repo. Your AI agent becomes a federal data science expert.
The handbook ships with native context files for Claude Code, Cursor, OpenCode, and Cline. Clone it, open it in your agent, and 96,000 words of platform constraints, compliance rules, and code patterns are live context for every conversation.
/compliance-check
Reviews code against NIST 800-53, DoD AI Ethics, and FedRAMP requirements. Returns severity-classified findings with remediation pointing to specific handbook sections.
/generate-federal-code
Generates platform-aware Python with mandatory compliance headers. Covers 22 task types across Databricks, Foundry, Advana, and Navy Jupiter — no hardcoded credentials, no external API calls at IL4+.
/teach
Interactive tutor mode. Opens with the chapter's narrative hook, presents learning objectives as a live checklist, and walks through concepts with code. Practitioner voice, not academic.
Who this is for
This handbook is written for data scientists and analysts starting a federal contract — or struggling through one. It assumes you know how to write Python and build models. It does not assume you know what a DD Form 2875 is, why PySpark is non-negotiable at DoD data scale, or how to explain your model's outputs to a flag officer.
The federal data science environment is not better or worse than commercial work. It is different in specific ways that are almost never documented. Access is denied by default. Classification levels determine architecture. Platforms are managed, not owned. The data quality problems are real and they require coordination, not just code.
This handbook documents the specific ways it is different, and what to do about each of them.