Docker & Kubernetes — Containerized AI Services

Docker packages your model + API + dependencies into a portable container. Kubernetes orchestrates containers at scale: auto-scaling based on request volume, rolling updates with zero downtime, health checks, and multi-GPU node scheduling. This is how every major AI API runs in production.

25 min•By Priygop Team•Updated 2026

Dockerfile and Kubernetes Deployment

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Dockerfile -- multi-stage build for minimal production image
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# dockerfile_content = '''
# Stage 1: Build dependencies
# FROM python:3.11-slim AS builder
# WORKDIR /app
# COPY requirements.txt .
# RUN pip install --user --no-cache-dir -r requirements.txt

# Stage 2: Production image (smaller -- no build tools)
# FROM python:3.11-slim AS production
# WORKDIR /app
# COPY --from=builder /root/.local /root/.local
# ENV PATH=/root/.local/bin:PATH

# Copy application code
# COPY ./app ./app
# COPY ./models ./models   # pre-downloaded model files

# Non-root user for security
# RUN adduser --disabled-password --gecos '' apiuser
# USER apiuser

# Health check
# HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
#   CMD curl -f http://localhost:8000/health || exit 1

# Expose port
# EXPOSE 8000

# Startup command with gunicorn (multi-process for CPU) or uvicorn (single process + async)
# CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
# '''

# build: docker build -t ai-api:v1.0.0 .
# run:   docker run -p 8000:8000 --gpus all ai-api:v1.0.0
# push:  docker push your-registry/ai-api:v1.0.0

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# KUBERNETES DEPLOYMENT MANIFEST
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
kubernetes_deployment = '''
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-api-deployment
  labels:
    app: ai-api
    version: v1.0.0
spec:
  replicas: 3                          # 3 pods for availability
  selector:
    matchLabels:
      app: ai-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1                      # create 1 extra pod during update
      maxUnavailable: 0                # never take pod offline (zero-downtime)
  template:
    metadata:
      labels:
        app: ai-api
    spec:
      containers:
      - name: ai-api
        image: your-registry/ai-api:v1.0.0
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "2Gi"
            cpu: "500m"
          limits:
            memory: "8Gi"
            cpu: "2000m"
            nvidia.com/gpu: 1           # request 1 GPU per pod
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:               # secrets from Kubernetes Secrets (not plain env vars!)
              name: api-secrets
              key: openai-key
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 60       # wait for model loading
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 90
          periodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
  name: ai-api-service
spec:
  selector:
    app: ai-api
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer                    # exposes via cloud load balancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-api-deployment
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70          # scale up when CPU > 70%
'''

print("Kubernetes deployment manifest ready!")
print("Apply with: kubectl apply -f deployment.yaml")
print("Scale manually: kubectl scale deployment ai-api --replicas=5")
print("Check pods: kubectl get pods -l app=ai-api")
print("Logs: kubectl logs -l app=ai-api --tail=100")

Tip

Practice Docker Kubernetes Containerized AI Services in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Technical diagram.

Practice Task

Note

Practice Task — (1) Write a working example of Docker Kubernetes Containerized AI Services from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Docker Kubernetes Containerized AI Services is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.

Topics in This Module