Docker & Kubernetes — Containerized AI Services
Docker packages your model + API + dependencies into a portable container. Kubernetes orchestrates containers at scale: auto-scaling based on request volume, rolling updates with zero downtime, health checks, and multi-GPU node scheduling. This is how every major AI API runs in production.
Dockerfile and Kubernetes Deployment
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Dockerfile -- multi-stage build for minimal production image
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# dockerfile_content = '''
# Stage 1: Build dependencies
# FROM python:3.11-slim AS builder
# WORKDIR /app
# COPY requirements.txt .
# RUN pip install --user --no-cache-dir -r requirements.txt
# Stage 2: Production image (smaller -- no build tools)
# FROM python:3.11-slim AS production
# WORKDIR /app
# COPY --from=builder /root/.local /root/.local
# ENV PATH=/root/.local/bin:PATH
# Copy application code
# COPY ./app ./app
# COPY ./models ./models # pre-downloaded model files
# Non-root user for security
# RUN adduser --disabled-password --gecos '' apiuser
# USER apiuser
# Health check
# HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
# CMD curl -f http://localhost:8000/health || exit 1
# Expose port
# EXPOSE 8000
# Startup command with gunicorn (multi-process for CPU) or uvicorn (single process + async)
# CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
# '''
# build: docker build -t ai-api:v1.0.0 .
# run: docker run -p 8000:8000 --gpus all ai-api:v1.0.0
# push: docker push your-registry/ai-api:v1.0.0
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# KUBERNETES DEPLOYMENT MANIFEST
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
kubernetes_deployment = '''
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-api-deployment
labels:
app: ai-api
version: v1.0.0
spec:
replicas: 3 # 3 pods for availability
selector:
matchLabels:
app: ai-api
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # create 1 extra pod during update
maxUnavailable: 0 # never take pod offline (zero-downtime)
template:
metadata:
labels:
app: ai-api
spec:
containers:
- name: ai-api
image: your-registry/ai-api:v1.0.0
ports:
- containerPort: 8000
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "8Gi"
cpu: "2000m"
nvidia.com/gpu: 1 # request 1 GPU per pod
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef: # secrets from Kubernetes Secrets (not plain env vars!)
name: api-secrets
key: openai-key
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60 # wait for model loading
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 90
periodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
name: ai-api-service
spec:
selector:
app: ai-api
ports:
- port: 80
targetPort: 8000
type: LoadBalancer # exposes via cloud load balancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ai-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ai-api-deployment
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale up when CPU > 70%
'''
print("Kubernetes deployment manifest ready!")
print("Apply with: kubectl apply -f deployment.yaml")
print("Scale manually: kubectl scale deployment ai-api --replicas=5")
print("Check pods: kubectl get pods -l app=ai-api")
print("Logs: kubectl logs -l app=ai-api --tail=100")Tip
Tip
Practice Docker Kubernetes Containerized AI Services in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of Docker Kubernetes Containerized AI Services from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Docker Kubernetes Containerized AI Services is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.