Reference

Agent Integration

The Federal Data Science Handbook is designed to work natively with AI coding agents. Clone the repo and your agent has 96,000 words of federal data science context available on demand — platform constraints, compliance rules, code patterns, and practitioner knowledge.

CLAUDE.md AGENTS.md Claude Code Cursor OpenCode Cline

How It Works

The handbook repo ships with a file called CLAUDE.md at the root. When you open the repo in Claude Code, this file is automatically loaded as context — your agent immediately has the full content map, three usage modes, and every constraint it needs.

There are three ways to use the handbook with an agent:

  1. REFERENCE — You're coding on a federal platform and need a quick answer. The agent navigates to the right chapter and code example.
  2. CODE GENERATION — You need compliant boilerplate. The /generate-federal-code command handles platform detection, constraint enforcement, and docstring formatting.
  3. TEACHING — You want to understand a concept. The /teach command walks through the handbook's narrative with code examples and pause points.

The context is hierarchical. Sub-directories have their own CLAUDE.md files:

  • chapters/CLAUDE.md — chapter index and code file mapping for all 13 chapters
  • platform-guides/CLAUDE.md — platform selection matrix, IL-level coverage, and guide structure
  • security-compliance/CLAUDE.md — module classification, architecture flow, and key files for compliance review

AGENTS.md provides the same orientation for any AI agent that reads Markdown — not just Claude Code. The repo includes both files so it works regardless of which tool you use.

Slash Commands

Three pre-built slash commands ship with the handbook. Each is available in Claude Code (as /command invocations), OpenCode (identical command structure), and Cline (as interactive workflows with wizard menus). Cursor invokes the same workflows conversationally through .cursorrules.

/compliance-check

Reviews code against federal security and compliance requirements. Reads security-compliance/security-policy.md plus type-specific sources, then generates a structured findings report.

What it checks:

  • No hardcoded credentials — must use env vars or platform secret management
  • No external API calls with government data at IL4+
  • Self-hosted models only at IL4+ for inference and embeddings
  • AES-256 at rest, TLS 1.3 in transit
  • CAC/PIV authentication — no username/password auth
  • RBAC with least privilege
  • Audit logging for all data access and model inference
  • Model cards required for production ML models
  • Bias audit required before deploying classifiers
  • NIST AI RMF documentation for AI systems
Example output
## Compliance Review — classification pipeline

### Findings

| # | Finding                          | Severity | Standard    | Remediation                    |
|---|----------------------------------|----------|-------------|--------------------------------|
| 1 | External API call at IL4         | Critical | FedRAMP     | Use self-hosted model endpoint |
| 2 | No audit logging on predictions  | High     | NIST 800-53 | Add audit_log() after infer()  |
| 3 | Missing model card               | Medium   | DoD AI      | Generate via ch12 template     |

### Priority Fixes
1. Replace api.openai.com call with self-hosted endpoint (see ch13 RAG pipeline)
2. Add audit logging to inference function (see security-compliance/audit/)

Severity levels: Critical = data leaks across classification boundaries or credentials exposed. High = missing encryption, hardcoded credentials, external API calls with government data. Medium = incomplete RBAC, missing model cards. Low = style issues, missing docstring headers.

/generate-federal-code

Generates platform-aware Python with mandatory compliance headers. Covers 22 task types across Databricks, Foundry, Advana, Navy Jupiter, and Qlik.

Every generated file starts with a mandatory docstring header:

Python — mandatory docstring header
"""
[Descriptive Title]
[What this code does and the use case it serves]
Platform: [Databricks | Foundry | Advana | Local | Any]
Usage: [exactly how to run it]
"""

Platform constraints enforced:

  • Databricks: SparkSession pre-initialized, no pip install in cells, dbutils.secrets for credentials, Unity Catalog for governance
  • Foundry: palantir_models for model publishing, @transform_df decorators for pipelines, no direct file I/O
  • Advana / Navy Jupiter: JupyterHub on shared cluster, conda environments only, no sudo
  • All platforms at IL4+: No external API calls, self-hosted models and embeddings only, data stays within classification boundary

At IL4 and above, the command will never generate code that calls external APIs (OpenAI, Anthropic hosted, HuggingFace Inference). All model inference and embedding generation must use self-hosted or platform-provided endpoints.

/teach

Interactive tutor mode. Opens with the chapter's narrative hook — a real scene with a real person in a real place — then presents learning objectives as a live checklist and walks through concepts one at a time with code.

The teaching flow:

  1. Maps your topic to the relevant chapter
  2. Opens with the chapter's narrative hook (first ~200 words)
  3. Presents learning objectives as a checklist
  4. Walks through each concept with federal context
  5. Pauses between concepts: "Want to see the code? Or would you like me to explain more?"
  6. Tracks which objectives have been covered
  7. Closes with a recap and pointer to exercises

The tutor follows the handbook's practitioner voice — specific details, named programs, direct "you" address. Not "one might consider deploying a model" but "you will need to get an ATO before your model touches production data."

Tool Support

The three workflows are implemented for four AI coding environments.

Claude Code

Three slash commands available via /command syntax. Context auto-loaded from CLAUDE.md on session start.

.claude/commands/

Cursor

Same workflows, invoked conversationally. Rules and constraints auto-loaded on project open.

.cursorrules

OpenCode

Three slash commands with identical structure. Config defines context files, workflows, and prohibited-file rules.

.opencode/config.yaml

Cline

Three interactive workflows with wizard-style menus. Most interactive implementation — uses branching prompts at every step.

.clinerules/workflows/

For other AI agents that read Markdown, AGENTS.md at the repo root provides the same workflow instructions, suggested prompts, and directory structure. The key constraint for any agent: never modify QA-signed-off content (chapter READMEs, platform guides, exercises).

Context Architecture

The handbook uses hierarchical context files so agents get progressively more specific guidance as they navigate into subdirectories.

Directory structure
handbook/
├── CLAUDE.md                    # Root context — content map, usage modes, commands
├── AGENTS.md                    # Agent-agnostic guide — workflows, prompts
├── .claude/commands/            # Claude Code slash commands
│   ├── compliance-check.md
│   ├── generate-federal-code.md
│   └── teach.md
├── .cursorrules                 # Cursor IDE rules (auto-loaded)
├── .opencode/                   # OpenCode config + commands
├── .clinerules/                 # Cline rules + interactive workflows
├── chapters/
│   └── CLAUDE.md                # Chapter index, code file mapping
├── platform-guides/
│   └── CLAUDE.md                # Platform selection matrix
└── security-compliance/
    └── CLAUDE.md                # Module classification, architecture flow

CLAUDE.md is auto-loaded by Claude Code. AGENTS.md serves the same purpose for any Markdown-reading agent. The repo includes both so it works regardless of which tool you use. The sub-directory CLAUDE.md files (chapters/, platform-guides/, security-compliance/) provide scoped context — an agent working inside chapters/ gets a chapter-specific index without needing to re-read the full root context.

Built With Agents

The handbook was written using AI coding agents following two internal specifications.

CHAPTER_WRITING_SPEC.md — an autonomous writing spec that defines chapter structure, quality criteria, and a "NEVER STOP" directive. The agent writes through to completion without pausing for human checkpoints — the spec itself is the quality control, not mid-process intervention.

STYLE_GUIDE.md — a practitioner voice guide that explicitly bans AI writing patterns. No "delve," no "landscape," no "robust." Every chapter opens with a specific scene — a real person in a real place with a real problem. The guide was designed to produce writing that doesn't read like AI wrote it.

The handbook was the first test of its own agent integration.

Get Started

  1. Clone the repo:
    bash
    git clone https://github.com/aporb/data-science-learning-handbook
  2. Open in your AI tool (Claude Code, Cursor, OpenCode, or Cline)
  3. The agent reads CLAUDE.md automatically. Start with a question.

Suggested first prompts:

  • "What are the five platforms and when do I use each one?"
  • "How do I set up Python on Databricks at IL4?"
  • "Review my code for federal compliance." (triggers compliance checker)
  • "Generate a classification pipeline for Databricks at IL4."
  • "Walk me through chapter 1." (triggers interactive tutor)