Open Reference — No Registration Required

Federal Data Science Handbook

A practical, platform-specific guide to data science in the federal government. Built for practitioners who need to ship real work on real DoD platforms — not academic exercises in ideal conditions.

Curriculum

13 Chapters

From your first week on contract to deploying production ML models. Each chapter is grounded in the specific platforms, constraints, and compliance requirements of federal data science work.

Chapter 01 Introduction to Data Science in Government The five platforms, Impact Levels, FedRAMP, Zero Trust — and why access is your first technical problem, not security's. Advana · Jupiter · Databricks · Qlik · Palantir · 5 exercises Chapter 02 Python and R Foundations for Federal Platforms Writing code that works inside managed environments. Package approval workflows, air-gapped PyPI, Databricks Runtime versioning, and CAC-based authentication patterns. Databricks · Palantir Foundry · Jupiter · 5 exercises Chapter 03 Data Acquisition in the Federal Ecosystem Working with Collibra catalogs, bronze/silver/gold data tiers, CUI handling, and the data steward relationships that determine what you can actually use. Collibra · Advana · Jupiter · 6 exercises Chapter 04 Data Wrangling at Scale PySpark transforms for DoD-scale logistics and financial data. Delta Lake, Unity Catalog, cleaning legacy system outputs, and writing audit-compliant pipelines. Databricks · PySpark · Delta Lake · 4 exercises Chapter 05 Exploratory Data Analysis EDA patterns for federal datasets. Profiling, anomaly detection, documenting findings for data stewards, and building the evidence base that justifies model development. Python · Databricks · Qlik · 6 exercises Chapter 06 Supervised Machine Learning Building predictive models on Databricks with MLflow tracking. Classification, regression, ensemble methods, SHAP explainability for command briefings. Databricks · MLflow · scikit-learn · 5 exercises Chapter 07 Unsupervised Machine Learning Clustering, anomaly detection, dimensionality reduction. Practical applications in maintenance forecasting, logistics anomaly detection, and personnel analytics. Python · Databricks · 5 exercises Chapter 08 Deep Learning and Neural Networks Deep learning in government-approved compute environments. Working within GPU access constraints, compliance-friendly model storage, and approved framework versions. Databricks · PyTorch · Mosaic AI · 4 exercises Chapter 09 MLOps in the Federal Environment MLflow, Unity Catalog model registry, CI/CD on GitLab in GovCloud, ATO requirements for production models, and monitoring deployed models in production. MLflow · GitLab · Databricks · 6 exercises Chapter 10 Data Visualization and Dashboards Building Qlik dashboards that flag officers actually use. The QIX Engine, Server-Side Extensions for ML integration, and the design patterns that survive the O-6 brief. Qlik · SSE · Python · 6 exercises Chapter 11 Deploying Models to Production Getting a model to production in a DoD environment. ATO requirements, security scans, approval chains, and what changes once a model is live and used for decisions. Databricks · Palantir · Qlik SSE · 6 exercises Chapter 12 Ethics and AI Governance AI ethics in the DoD context. Responsible AI principles, bias in military datasets, the Directive 3000.09 framework for autonomous systems, and documenting model limitations. Policy · RAI · Compliance · 5 exercises Chapter 13 Advanced Topics and Emerging Capabilities Large language models in the federal context, Palantir AIP agents, multi-source data fusion, and what the Maven Smart System tells us about the operational edge of military AI. Palantir AIP · LLMs · Agent Studio · 6 exercises
Reference

5 Platform Guides

Deep-dive reference for each of the five major DoD data science platforms. Architecture, access patterns, key APIs, and production-ready code examples.

Beyond the Chapters

The Full Toolkit

The handbook repository includes everything you need to learn, practice, and build in a federal data science environment.

Agent-Ready

Clone this repo. Your AI agent becomes a federal data science expert.

The handbook ships with native context files for Claude Code, Cursor, OpenCode, and Cline. Clone it, open it in your agent, and 96,000 words of platform constraints, compliance rules, and code patterns are live context for every conversation.

/compliance-check

Reviews code against NIST 800-53, DoD AI Ethics, and FedRAMP requirements. Returns severity-classified findings with remediation pointing to specific handbook sections.

Claude Code OpenCode Cline Cursor
/generate-federal-code

Generates platform-aware Python with mandatory compliance headers. Covers 22 task types across Databricks, Foundry, Advana, and Navy Jupiter — no hardcoded credentials, no external API calls at IL4+.

Claude Code OpenCode Cline Cursor
/teach

Interactive tutor mode. Opens with the chapter's narrative hook, presents learning objectives as a live checklist, and walks through concepts with code. Practitioner voice, not academic.

Claude Code OpenCode Cline Cursor
About

Who this is for

This handbook is written for data scientists and analysts starting a federal contract — or struggling through one. It assumes you know how to write Python and build models. It does not assume you know what a DD Form 2875 is, why PySpark is non-negotiable at DoD data scale, or how to explain your model's outputs to a flag officer.

The federal data science environment is not better or worse than commercial work. It is different in specific ways that are almost never documented. Access is denied by default. Classification levels determine architecture. Platforms are managed, not owned. The data quality problems are real and they require coordination, not just code.

This handbook documents the specific ways it is different, and what to do about each of them.