Chapter 11

Deployment and Scaling

Getting a model through ATO is a different skill set than building the model. This chapter covers the ATO process for ML systems, batch vs. real-time deployment patterns, MLflow model packaging, the FastAPI pattern for IL4 endpoints, Dockerizing for GovCloud, Foundry batch scoring transforms, and the deployment readiness checklist that actually gets models to production.

~55 min read Databricks, Advana, Navy Jupiter, Palantir Foundry Code on GitHub

ATO Requirements for ML Systems

Every ML system deployed in a federal operational environment requires an Authority to Operate (ATO) — a formal authorization from the system's Authorizing Official (AO) that the security risks are acceptable. For ML-specific systems, the ATO package includes standard security controls (NIST SP 800-53) plus AI-specific documentation: the model card, the RAI assessment, the training data provenance record, and the monitoring plan.

ATO Consideration What It Means for the Data Scientist Lead Time
System boundary definition Is your model a new system or part of an existing authorized system? 3–6 months if new system
Security controls documentation All technical controls (encryption, access, logging) must be documented 4–8 weeks
Penetration testing Required for systems at Moderate impact and above 4–6 weeks
AI/ML-specific documentation Model card, RAI assessment, training data provenance 2–4 weeks
Continuous monitoring plan Defines who monitors, what metrics, and retraining triggers 1–2 weeks to write

Start the ATO process the day you start development. Not when the model is done. An ATO application that arrives after six months of model development with a "we need this deployed next week" timeline will sit in the queue for months. ATO reviewers cannot be rushed. Plan for 3–6 months minimum for a new system at Moderate or High impact.

Deployment Patterns

Batch Scoring

Batch scoring runs on a schedule — nightly, weekly, or triggered by new data arrivals. It produces a scored table that analysts query rather than calling a live model endpoint. For most federal ML use cases (readiness prediction, procurement risk, anomaly detection), batch scoring is the right pattern: it produces a defensible audit trail of every prediction and doesn't require low-latency infrastructure.

python
import mlflow.pyfunc
from pyspark.sql import functions as F
from datetime import date

# Production batch scoring — runs as a Databricks Workflow job (scheduled)
# Always load by stage alias, never by version number
model_name = "maintenance_overrun_classifier"
model      = mlflow.pyfunc.load_model(f"models:/{model_name}/Production")

# Score all open work orders from the last 90 days
pending = (
    spark.table("jupiter_catalog.silver.maintenance_work_orders")
    .filter(F.col("completion_date").isNull())
    .filter(F.col("start_date") >= F.date_sub(F.current_date(), 90))
    .join(
        spark.table("jupiter_catalog.reference.ship_registry")
             .select("hull_number", "hull_class", "commission_year"),
        "hull_number", "left"
    )
    .withColumn("ship_age_years", F.year(F.current_date()) - F.col("commission_year"))
    .withColumn("start_month", F.month("start_date"))
    .toPandas()
)

print(f"Scoring {len(pending):,} pending work orders as of {date.today()}")

feature_cols = ["labor_hours_estimated", "estimated_completion_days", "ship_age_years",
                "prior_work_order_count", "start_month", "data_quality_score",
                "hull_class", "maintenance_category"]

scores = model.predict(pending[feature_cols])
pending["overrun_probability"] = scores
pending["overrun_flag"]        = (scores >= 0.65).astype(int)
pending["scored_date"]         = date.today().isoformat()
pending["model_version"]       = "Production"

output_spark = spark.createDataFrame(
    pending[["work_order_id", "overrun_probability", "overrun_flag",
             "scored_date", "model_version"]]
)
(
    output_spark
    .write.format("delta").mode("overwrite")
    .option("replaceWhere", f"scored_date = '{date.today().isoformat()}'")
    .saveAsTable("jupiter_catalog.gold.maintenance_overrun_scores")
)
print(f"Written {len(pending):,} scores | Flagged: {pending['overrun_flag'].sum():,}")

Real-Time API with FastAPI for IL4

When latency matters — an analyst needs a score within 200ms to make a decision — deploy a FastAPI endpoint in a containerized service. At IL4, this runs in AWS GovCloud behind the agency's SSO. Every request is logged to an audit table.

python
from fastapi import FastAPI, HTTPException, Depends
from fastapi.security import HTTPBearer
from pydantic import BaseModel
import mlflow.pyfunc
import pandas as pd
import uuid
from datetime import datetime

app    = FastAPI(title="Maintenance Overrun Risk API", version="1.0.0")
bearer = HTTPBearer()

# Load model at startup — not per request
model = mlflow.pyfunc.load_model("models:/maintenance_overrun_classifier/Production")


class ScoringRequest(BaseModel):
    work_order_id:              str
    labor_hours_estimated:      float
    estimated_completion_days:  float
    ship_age_years:             float
    prior_work_order_count:     int
    start_month:                int
    data_quality_score:         float
    hull_class:                 str
    maintenance_category:       str


class ScoringResponse(BaseModel):
    work_order_id:       str
    overrun_probability: float
    overrun_flag:        int
    model_version:       str
    request_id:          str
    scored_at:           str


@app.post("/score", response_model=ScoringResponse)
async def score_work_order(request: ScoringRequest,
                           credentials=Depends(bearer)) -> ScoringResponse:
    request_id = str(uuid.uuid4())

    try:
        features = pd.DataFrame([request.dict()])
        proba    = float(model.predict(features)[0])
        flag     = int(proba >= 0.65)

        result = ScoringResponse(
            work_order_id=request.work_order_id,
            overrun_probability=round(proba, 4),
            overrun_flag=flag,
            model_version="Production",
            request_id=request_id,
            scored_at=datetime.utcnow().isoformat(),
        )

        # Log to audit table (Delta table append, not log file)
        _log_inference(request, result)
        return result

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))


def _log_inference(request: ScoringRequest, response: ScoringResponse) -> None:
    """Write inference record to audit Delta table. Never skip this."""
    audit_record = {**request.dict(), **response.dict()}
    # Append to audit table via Databricks REST API or direct write
    pass

Dockerfile for GovCloud

bash
FROM python:3.10-slim

WORKDIR /app

# Install only production dependencies — no dev tools in production image
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ .

# Non-root user — required for FedRAMP High hardening
RUN adduser --disabled-password --gecos "" appuser
USER appuser

# Health check — required for container orchestration (ECS, EKS)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

Foundry Batch Scoring Transform

python
from transforms.api import transform, Input, Output
from palantir_models.transforms import ModelInput

@transform(
    scores=Output("/analytics/gold/maintenance_overrun_scores"),
    pending_work_orders=Input("/analytics/silver/maintenance_work_orders"),
    model=ModelInput("/models/maintenance_overrun_v2"),
)
def score_work_orders(pending_work_orders, model, scores):
    """
    Foundry scheduled transform: score all pending work orders nightly.
    Output is promoted to Gold tier and surfaced in Workshop dashboards.
    Foundry automatically tracks lineage: these scores derive from
    pending_work_orders via model version X.
    """
    df = pending_work_orders.dataframe().toPandas()

    feature_cols = ["labor_hours_estimated", "estimated_completion_days",
                    "ship_age_years", "prior_work_order_count", "start_month",
                    "data_quality_score", "hull_class", "maintenance_category"]

    results = model.api().predict(df[feature_cols])
    df["overrun_probability"] = results["overrun_probability"]
    df["overrun_flag"]        = results["overrun_flag"]

    output_df = df[["work_order_id", "overrun_probability", "overrun_flag"]]
    scores.write_dataframe(spark.createDataFrame(output_df))

Deployment Readiness Checklist

  • ATO process started at development kickoff (not model completion)
  • Model card completed with all 9 required sections (see Chapter 12)
  • RAI assessment completed or in progress with approver identified
  • Training data provenance documented in Unity Catalog or Foundry
  • All credentials stored in secrets manager — zero hardcoded values in code
  • Audit logging implemented — every prediction captured to a durable store
  • Confidence threshold defined and documented
  • Human-in-the-loop path defined for low-confidence predictions
  • Monitoring owner named (a person, not a team)
  • Retraining trigger defined as a specific metric threshold
  • Rollback procedure documented and tested
  • Post-deployment performance review scheduled at 30, 90, and 180 days

Where This Goes Wrong

Failure Mode 1: Starting ATO at Model Completion

"The model is done — we just need to get it approved." The ATO queue is 4 months long. The program manager is expecting deployment next month. The data scientist who built the model rolls off the contract before deployment. Fix: ATO process starts on Day 1, concurrent with model development. The documentation isn't a post-hoc task.

Failure Mode 2: No Rollback Plan

The new model version deploys and produces a systematic bias on a specific vessel class that nobody noticed in staging. Fix: always maintain the previous Production model version in the MLflow Registry (do not archive it immediately). Define the rollback procedure — a one-command version stage transition — and test it before the deployment. The ability to revert in 10 minutes is worth more than a perfect release process.

Failure Mode 3: Deploying Without Audit Logging

A model scores 50,000 work orders per week. Three months later an analyst questions a specific score from six weeks ago. Without an audit table capturing every prediction, there is no answer. Fix: every inference writes to an audit table — the work order ID, the feature values, the prediction, the model version, the timestamp, and the requesting user. This is non-negotiable for any federal deployment.

Exercises

This chapter includes 6 hands-on exercises with full solutions — coding challenges, analysis tasks, and scenario-based problems.

View Exercises on GitHub →