Deployment and Scaling
Getting a model through ATO is a different skill set than building the model. This chapter covers the ATO process for ML systems, batch vs. real-time deployment patterns, MLflow model packaging, the FastAPI pattern for IL4 endpoints, Dockerizing for GovCloud, Foundry batch scoring transforms, and the deployment readiness checklist that actually gets models to production.
ATO Requirements for ML Systems
Every ML system deployed in a federal operational environment requires an Authority to Operate (ATO) — a formal authorization from the system's Authorizing Official (AO) that the security risks are acceptable. For ML-specific systems, the ATO package includes standard security controls (NIST SP 800-53) plus AI-specific documentation: the model card, the RAI assessment, the training data provenance record, and the monitoring plan.
| ATO Consideration | What It Means for the Data Scientist | Lead Time |
|---|---|---|
| System boundary definition | Is your model a new system or part of an existing authorized system? | 3–6 months if new system |
| Security controls documentation | All technical controls (encryption, access, logging) must be documented | 4–8 weeks |
| Penetration testing | Required for systems at Moderate impact and above | 4–6 weeks |
| AI/ML-specific documentation | Model card, RAI assessment, training data provenance | 2–4 weeks |
| Continuous monitoring plan | Defines who monitors, what metrics, and retraining triggers | 1–2 weeks to write |
Start the ATO process the day you start development. Not when the model is done. An ATO application that arrives after six months of model development with a "we need this deployed next week" timeline will sit in the queue for months. ATO reviewers cannot be rushed. Plan for 3–6 months minimum for a new system at Moderate or High impact.
Deployment Patterns
Batch Scoring
Batch scoring runs on a schedule — nightly, weekly, or triggered by new data arrivals. It produces a scored table that analysts query rather than calling a live model endpoint. For most federal ML use cases (readiness prediction, procurement risk, anomaly detection), batch scoring is the right pattern: it produces a defensible audit trail of every prediction and doesn't require low-latency infrastructure.
import mlflow.pyfunc
from pyspark.sql import functions as F
from datetime import date
# Production batch scoring — runs as a Databricks Workflow job (scheduled)
# Always load by stage alias, never by version number
model_name = "maintenance_overrun_classifier"
model = mlflow.pyfunc.load_model(f"models:/{model_name}/Production")
# Score all open work orders from the last 90 days
pending = (
spark.table("jupiter_catalog.silver.maintenance_work_orders")
.filter(F.col("completion_date").isNull())
.filter(F.col("start_date") >= F.date_sub(F.current_date(), 90))
.join(
spark.table("jupiter_catalog.reference.ship_registry")
.select("hull_number", "hull_class", "commission_year"),
"hull_number", "left"
)
.withColumn("ship_age_years", F.year(F.current_date()) - F.col("commission_year"))
.withColumn("start_month", F.month("start_date"))
.toPandas()
)
print(f"Scoring {len(pending):,} pending work orders as of {date.today()}")
feature_cols = ["labor_hours_estimated", "estimated_completion_days", "ship_age_years",
"prior_work_order_count", "start_month", "data_quality_score",
"hull_class", "maintenance_category"]
scores = model.predict(pending[feature_cols])
pending["overrun_probability"] = scores
pending["overrun_flag"] = (scores >= 0.65).astype(int)
pending["scored_date"] = date.today().isoformat()
pending["model_version"] = "Production"
output_spark = spark.createDataFrame(
pending[["work_order_id", "overrun_probability", "overrun_flag",
"scored_date", "model_version"]]
)
(
output_spark
.write.format("delta").mode("overwrite")
.option("replaceWhere", f"scored_date = '{date.today().isoformat()}'")
.saveAsTable("jupiter_catalog.gold.maintenance_overrun_scores")
)
print(f"Written {len(pending):,} scores | Flagged: {pending['overrun_flag'].sum():,}")
Real-Time API with FastAPI for IL4
When latency matters — an analyst needs a score within 200ms to make a decision — deploy a FastAPI endpoint in a containerized service. At IL4, this runs in AWS GovCloud behind the agency's SSO. Every request is logged to an audit table.
from fastapi import FastAPI, HTTPException, Depends
from fastapi.security import HTTPBearer
from pydantic import BaseModel
import mlflow.pyfunc
import pandas as pd
import uuid
from datetime import datetime
app = FastAPI(title="Maintenance Overrun Risk API", version="1.0.0")
bearer = HTTPBearer()
# Load model at startup — not per request
model = mlflow.pyfunc.load_model("models:/maintenance_overrun_classifier/Production")
class ScoringRequest(BaseModel):
work_order_id: str
labor_hours_estimated: float
estimated_completion_days: float
ship_age_years: float
prior_work_order_count: int
start_month: int
data_quality_score: float
hull_class: str
maintenance_category: str
class ScoringResponse(BaseModel):
work_order_id: str
overrun_probability: float
overrun_flag: int
model_version: str
request_id: str
scored_at: str
@app.post("/score", response_model=ScoringResponse)
async def score_work_order(request: ScoringRequest,
credentials=Depends(bearer)) -> ScoringResponse:
request_id = str(uuid.uuid4())
try:
features = pd.DataFrame([request.dict()])
proba = float(model.predict(features)[0])
flag = int(proba >= 0.65)
result = ScoringResponse(
work_order_id=request.work_order_id,
overrun_probability=round(proba, 4),
overrun_flag=flag,
model_version="Production",
request_id=request_id,
scored_at=datetime.utcnow().isoformat(),
)
# Log to audit table (Delta table append, not log file)
_log_inference(request, result)
return result
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
def _log_inference(request: ScoringRequest, response: ScoringResponse) -> None:
"""Write inference record to audit Delta table. Never skip this."""
audit_record = {**request.dict(), **response.dict()}
# Append to audit table via Databricks REST API or direct write
pass
Dockerfile for GovCloud
FROM python:3.10-slim
WORKDIR /app
# Install only production dependencies — no dev tools in production image
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY src/ .
# Non-root user — required for FedRAMP High hardening
RUN adduser --disabled-password --gecos "" appuser
USER appuser
# Health check — required for container orchestration (ECS, EKS)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
Foundry Batch Scoring Transform
from transforms.api import transform, Input, Output
from palantir_models.transforms import ModelInput
@transform(
scores=Output("/analytics/gold/maintenance_overrun_scores"),
pending_work_orders=Input("/analytics/silver/maintenance_work_orders"),
model=ModelInput("/models/maintenance_overrun_v2"),
)
def score_work_orders(pending_work_orders, model, scores):
"""
Foundry scheduled transform: score all pending work orders nightly.
Output is promoted to Gold tier and surfaced in Workshop dashboards.
Foundry automatically tracks lineage: these scores derive from
pending_work_orders via model version X.
"""
df = pending_work_orders.dataframe().toPandas()
feature_cols = ["labor_hours_estimated", "estimated_completion_days",
"ship_age_years", "prior_work_order_count", "start_month",
"data_quality_score", "hull_class", "maintenance_category"]
results = model.api().predict(df[feature_cols])
df["overrun_probability"] = results["overrun_probability"]
df["overrun_flag"] = results["overrun_flag"]
output_df = df[["work_order_id", "overrun_probability", "overrun_flag"]]
scores.write_dataframe(spark.createDataFrame(output_df))
Deployment Readiness Checklist
- ATO process started at development kickoff (not model completion)
- Model card completed with all 9 required sections (see Chapter 12)
- RAI assessment completed or in progress with approver identified
- Training data provenance documented in Unity Catalog or Foundry
- All credentials stored in secrets manager — zero hardcoded values in code
- Audit logging implemented — every prediction captured to a durable store
- Confidence threshold defined and documented
- Human-in-the-loop path defined for low-confidence predictions
- Monitoring owner named (a person, not a team)
- Retraining trigger defined as a specific metric threshold
- Rollback procedure documented and tested
- Post-deployment performance review scheduled at 30, 90, and 180 days
Where This Goes Wrong
Failure Mode 1: Starting ATO at Model Completion
"The model is done — we just need to get it approved." The ATO queue is 4 months long. The program manager is expecting deployment next month. The data scientist who built the model rolls off the contract before deployment. Fix: ATO process starts on Day 1, concurrent with model development. The documentation isn't a post-hoc task.
Failure Mode 2: No Rollback Plan
The new model version deploys and produces a systematic bias on a specific vessel class that nobody noticed in staging. Fix: always maintain the previous Production model version in the MLflow Registry (do not archive it immediately). Define the rollback procedure — a one-command version stage transition — and test it before the deployment. The ability to revert in 10 minutes is worth more than a perfect release process.
Failure Mode 3: Deploying Without Audit Logging
A model scores 50,000 work orders per week. Three months later an analyst questions a specific score from six weeks ago. Without an audit table capturing every prediction, there is no answer. Fix: every inference writes to an audit table — the work order ID, the feature values, the prediction, the model version, the timestamp, and the requesting user. This is non-negotiable for any federal deployment.
Exercises
This chapter includes 6 hands-on exercises with full solutions — coding challenges, analysis tasks, and scenario-based problems.
View Exercises on GitHub →